text icon indicating copy to clipboard operation
text copied to clipboard

Models, data loaders and abstractions for language processing, powered by PyTorch

Results 221 text issues
Sort by recently updated
recently updated
newest added

This is a PR for new torchtext API in machine translation use case. This includes: - Sample on how to build character and word representation - Embedding model for character...

@zhangguanheng66 I'm proposing a sampler class with similar functionality as the [BucketIterator](https://github.com/pytorch/text/blob/bcb9104680eb9dc978a6bbcc2b9ca46cf2bdbed9/torchtext/data/iterator.py#L241). Let me know what you think of this. Thanks!

### Documentation variable error `ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True) ` to ` ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True)` "tokens" variable not defined in the example.

This PR adds most of methods define in SentencePieceProcessor Python wrapper. ~~Blocked by https://github.com/pytorch/pytorch/pull/38167~~ - `NBestEncodeAsPieces` - `NBestEncodeAsIds` - `SampleEncodeAsPieces` - `SampleEncodeAsIds` - `DecodePieces` - `DecodeIds` - `GetPieceSize` - `PieceToId`...

cla signed

There are five generic functions introduced in the current code vocab_func - returns a function that calls ```__getitem__``` on each entry of a given list using a particular vocab object....

cla signed

Bugfix: https://github.com/pytorch/text/issues/618, https://github.com/pytorch/text/issues/706 Newly, this changes adds `unk_token` argument to build_vocab method for set by Field. Also, for backward compatibility, this PR leaves `Vocab.UNK` as default token.

cla signed

Fixes #645 - Added WMT News Crawl dataset for language modeling

Delegate the `unk_token` to arguments when constructing the vocabulary. Fixes #618 , relatively major issue.