text
text copied to clipboard
Models, data loaders and abstractions for language processing, powered by PyTorch
Edit raw.translation dataset to return a RawTextIterableDataset, which uses worker information to restrict the underlying iterator to a subset such that DataLoader won't return duplicate entries, if given an instance...
Update doc strings and clarify that the file object supported by the functions are the file opened in a text mode. A followup task is to add the support of...
This PR typedefs strings that are meant to be constant, i.e. read-only. They can then be optionally replaced by std::string_view. Also adds the ever so important "#pragma once" to the...
Create a single union regular expression that uses a lambda to query a dictionary of patterns for the correct replacement. This causes a significant speedup, however is different from the...
Right now the default sometime is "train", "test", "valid" and sometimes (but more commonly) "train", "valid", "test". We should pick a single convention (this PR opts for the latter) to...
Language modeling datasets construct *all* datasets even if only a subset is constructed. It also stores the fully numericalized version of the dataset if it's stored as "a single line"...
a) The documentation doesn't clearly state that one factory function is meant to be used to construct a Vocabulary from a dataset (e.g. AG_NEWS) and another is meant to be...
Since torchtext is built against specific torch versions, they need to be specified in the setup.py. Otherwise pip will install incompatible versions. Fixes #902