text issues

'specials' not available in version 0.10.0

3

Hello Everyone, I am trying some old codes with torchtext, due to packages conflicts, I choose pytorch1.9.0cu102+torchtext0.10.0, I already modified a lot of parts like legacy to make the code...

suice07

Save and loading vocabaluray

1

## ❓ Questions and Help **Description** I trained a classification model and used torchtext to create vocabulary from a pre-trained model. My problem is that when saving the model, I...

laleye

[Nova] Simplify Caller Workflows

There's no need to have a matrix in the caller workflow. Let's just pass these inputs directly. We should do this for all caller workflow across all the repos as...

osalpekar

cla signed

args.pipeline_mode=pipe to use torch.distributed.pipeline.sync.Pipe

2

pbelevich

cla signed

Cannot run text_classifier end to end

2

## 🐛 Bug There's a minor issue with the `text_classifier` in the `examples` folder. When I run the `run_script.sh` it creates a `.data` folder, then the `train` command `python train.py...

david-waterworth

Add `LengthSetterIterDataPipe` to all torchtext datasets

2

## 🚀 Feature We want to add the [`LengthSetterIterDataPipe`](https://github.com/pytorch/data/blob/719616a1b4791034da3d888357e3ef62c70806e3/torchdata/datapipes/iter/util/header.py#L66-L67) to the end of all torchtext datasets. This will allow us to call `len()` on the datapipe object and prevent errors...

Nayef211

torchtext.transforms does not provide custom tokenization

2

## 🚀 Feature In vertion 0.13.0 we can use BertTokenizer, ClipTokenizer etc. but we cannot use custom tokenizer. **Motivation** GPT2 uses different tokenization technique. sometime we want to use nltk...

pandya6988

Ensure `main` and `fbsync` are both in sync

## Description - Currently the `fbsync` branch is [172 commits ahead](https://github.com/pytorch/text/compare/main...fbsync), [749 commits behind](https://github.com/pytorch/text/compare/fbsync...main) main. - We want to ensure that `main` and `fbsync` branches are both up to date...

Nayef211

[DO NOT MERGE] Testing CI

Nayef211

cla signed

Libtorchtext Bert Model

Hi, is there a tutorial for Libtorchtext Bert implementation? There are some scripts [here](https://github.com/pytorch/text/blob/main/examples/libtorchtext/tokenizer/main.cpp) about the BERT in c++, but I couldn't find any example of how to use it...

EmreOzkose

text
text copied to clipboard

Metadata

'specials' not available in version 0.10.0

Save and loading vocabaluray

[Nova] Simplify Caller Workflows

args.pipeline_mode=pipe to use torch.distributed.pipeline.sync.Pipe

Cannot run text_classifier end to end

Add `LengthSetterIterDataPipe` to all torchtext datasets

torchtext.transforms does not provide custom tokenization

Ensure `main` and `fbsync` are both in sync

[DO NOT MERGE] Testing CI

Libtorchtext Bert Model

← Metadata

Owner

Metadata

text text copied to clipboard

Metadata

← Metadata

Owner

Metadata

text
text copied to clipboard