text
text copied to clipboard
Models, data loaders and abstractions for language processing, powered by PyTorch
Hello Everyone, I am trying some old codes with torchtext, due to packages conflicts, I choose pytorch1.9.0cu102+torchtext0.10.0, I already modified a lot of parts like legacy to make the code...
## ❓ Questions and Help **Description** I trained a classification model and used torchtext to create vocabulary from a pre-trained model. My problem is that when saving the model, I...
There's no need to have a matrix in the caller workflow. Let's just pass these inputs directly. We should do this for all caller workflow across all the repos as...
## 🐛 Bug There's a minor issue with the `text_classifier` in the `examples` folder. When I run the `run_script.sh` it creates a `.data` folder, then the `train` command `python train.py...
## 🚀 Feature We want to add the [`LengthSetterIterDataPipe`](https://github.com/pytorch/data/blob/719616a1b4791034da3d888357e3ef62c70806e3/torchdata/datapipes/iter/util/header.py#L66-L67) to the end of all torchtext datasets. This will allow us to call `len()` on the datapipe object and prevent errors...
## 🚀 Feature In vertion 0.13.0 we can use BertTokenizer, ClipTokenizer etc. but we cannot use custom tokenizer. **Motivation** GPT2 uses different tokenization technique. sometime we want to use nltk...
## Description - Currently the `fbsync` branch is [172 commits ahead](https://github.com/pytorch/text/compare/main...fbsync), [749 commits behind](https://github.com/pytorch/text/compare/fbsync...main) main. - We want to ensure that `main` and `fbsync` branches are both up to date...
Hi, is there a tutorial for Libtorchtext Bert implementation? There are some scripts [here](https://github.com/pytorch/text/blob/main/examples/libtorchtext/tokenizer/main.cpp) about the BERT in c++, but I couldn't find any example of how to use it...