text issues

Experimental machine translation example

11

This is a PR for new torchtext API in machine translation use case. This includes: - Sample on how to build character and word representation - Embedding model for character...

akurniawan

Experimental bucket by sequence length sampler

15

@zhangguanheng66 I'm proposing a sampler class with similar functionality as the [BucketIterator](https://github.com/pytorch/text/blob/bcb9104680eb9dc978a6bbcc2b9ca46cf2bdbed9/torchtext/data/iterator.py#L241). Let me know what you think of this. Thanks!

akurniawan

[Draft]some tests

1

AITutorials

cla signed

Update vocab.py

### Documentation variable error `ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True) ` to ` ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True)` "tokens" variable not defined in the example.

shangeth

Use torch.testing._internal.common_utils.TestCase

cpuhrsch

cla signed

Add methods defined on SentencePieceProcessor

5

This PR adds most of methods define in SentencePieceProcessor Python wrapper. ~~Blocked by https://github.com/pytorch/pytorch/pull/38167~~ - `NBestEncodeAsPieces` - `NBestEncodeAsIds` - `SampleEncodeAsPieces` - `SampleEncodeAsIds` - `DecodePieces` - `DecodeIds` - `GetPieceSize` - `PieceToId`...

mthrok

cla signed

[WIP] Simplifications and code formatting for experimental text classification datasets

4

There are five generic functions introduced in the current code vocab_func - returns a function that calls ```__getitem__``` on each entry of a given list using a particular vocab object....

cpuhrsch

cla signed

Bugfix: custom UNK token conversion error

Bugfix: https://github.com/pytorch/text/issues/618, https://github.com/pytorch/text/issues/706 Newly, this changes adds `unk_token` argument to build_vocab method for set by Field. Also, for backward compatibility, this PR leaves `Vocab.UNK` as default token.

ohke