text
text copied to clipboard
Models, data loaders and abstractions for language processing, powered by PyTorch
change torchtext.experimental.datasets to torchtext.datasets for latest version.
This PR adds Windows build instructions for the repo. The repo does not compile without using the [MSVC x64 toolset](https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecture). The given instructions in the PR is the same way...
We replace container type of stoi_ from std::vector to std::array. This bring in slight additional improvements in look-up speed
I found it hard to understand how to preform a "full loop" of Token -> index -> Token, I think the intended manner is to just access the itos variable...
Add the example doc strings to `torchtext.datasets`. 
For FB internal test, the `raw_datasets.json` contents are not valid. Update the format to pass the internal lint check.
Currently `torchtext` runs tests by `pytest test`. This is fine for development but one cannot run unit test on installed package in this manner because running test from the root...
The legacy Trec dataset was retired in `torchtext.legacy` folder. This one yields the raw text strings.
The legacy SST was retired in `torchtext.legacy` folder. This one yields the raw text strings.
The following three datasets have been retired in the `legacy.datasets` folder. We are re-writing these by yielding the raw texts: - SNLI - MatchedMultiNLI ([link](https://www.kaggle.com/c/multinli-matched-open-evaluation)) - MismatchedMultiNLI ([link](https://www.kaggle.com/c/multinli-mismatched-open-evaluation)) Unfortunately, The...