text issues

fix text_classfication dataset import package

3

change torchtext.experimental.datasets to torchtext.datasets for latest version.

cla signed

Added Windows build instructions

This PR adds Windows build instructions for the repo. The repo does not compile without using the [MSVC x64 toolset](https://docs.microsoft.com/en-us/cpp/build/how-to-enable-a-64-bit-visual-cpp-toolset-on-the-command-line?view=msvc-160#use-vcvarsallbat-to-set-a-64-bit-hosted-build-architecture). The given instructions in the PR is the same way...

mstfbl

cla signed

[WIP] using std::array in vocab for additional speed-ups

1

We replace container type of stoi_ from std::vector to std::array. This bring in slight additional improvements in look-up speed

parmeet

cla signed

Helper function for Vocab object

5

I found it hard to understand how to preform a "full loop" of Token -> index -> Token, I think the intended manner is to just access the itos variable...

TuckerBMorgan

cla signed

Doc string for the examples of datasets

1

Add the example doc strings to `torchtext.datasets`. ![Screen Shot 2021-03-11 at 9 27 23 PM](https://user-images.githubusercontent.com/6156351/110882897-a0e2b880-82b0-11eb-9945-40368bc63d40.png)

zhangguanheng66

cla signed

Update the format of the raw_datasets.json file to pass FB internal lint check

For FB internal test, the `raw_datasets.json` contents are not valid. Update the format to pass the internal lint check.

zhangguanheng66

cla signed

Fix the way tests are executed

Currently `torchtext` runs tests by `pytest test`. This is fine for development but one cannot run unit test on installed package in this manner because running test from the root...

mthrok

cla signed

Add Trec dataset

1

The legacy Trec dataset was retired in `torchtext.legacy` folder. This one yields the raw text strings.

zhangguanheng66

cla signed

Add Stanford Sentiment Treebank (SST) dataset

1

The legacy SST was retired in `torchtext.legacy` folder. This one yields the raw text strings.

zhangguanheng66

cla signed

Add Natural Language Inference datasets

1

The following three datasets have been retired in the `legacy.datasets` folder. We are re-writing these by yielding the raw texts: - SNLI - MatchedMultiNLI ([link](https://www.kaggle.com/c/multinli-matched-open-evaluation)) - MismatchedMultiNLI ([link](https://www.kaggle.com/c/multinli-mismatched-open-evaluation)) Unfortunately, The...

zhangguanheng66

cla signed

text
text copied to clipboard

Metadata

fix text_classfication dataset import package

Added Windows build instructions

[WIP] using std::array in vocab for additional speed-ups

Helper function for Vocab object

Doc string for the examples of datasets

Update the format of the raw_datasets.json file to pass FB internal lint check

Fix the way tests are executed

Add Trec dataset

Add Stanford Sentiment Treebank (SST) dataset

Add Natural Language Inference datasets

← Metadata

Owner

Metadata

text text copied to clipboard

Metadata

← Metadata

Owner

Metadata

text
text copied to clipboard