Guanheng George Zhang
Guanheng George Zhang
Hey @bentrevett, Thanks for your tutorial. Since torchtext has updated the datasets with the new abstraction, I'm wondering if you plan to update the tutorial here. One of the users...
Add the example doc strings to `torchtext.datasets`. 
For FB internal test, the `raw_datasets.json` contents are not valid. Update the format to pass the internal lint check.
The legacy Trec dataset was retired in `torchtext.legacy` folder. This one yields the raw text strings.
The legacy SST was retired in `torchtext.legacy` folder. This one yields the raw text strings.
The following three datasets have been retired in the `legacy.datasets` folder. We are re-writing these by yielding the raw texts: - SNLI - MatchedMultiNLI ([link](https://www.kaggle.com/c/multinli-matched-open-evaluation)) - MismatchedMultiNLI ([link](https://www.kaggle.com/c/multinli-mismatched-open-evaluation)) Unfortunately, The...
This is a prototype pretrained XLM-R model based on the RoBERTa encoder. There are a few features that we would like to highlight and collect feedback: - The basic nn...
In the [XLM-R](https://arxiv.org/pdf/1911.02116.pdf) model, SentencePiece is used to tokenize the strings. We enable the sentencepiece processing pipeline here for the BERT workflow.
On top of https://github.com/pytorch/text/pull/1027 Add `__setitem__` func to torchtext.experimental.vocab.Vocab. A `__delitem__` func is added as well. [RuntimeError] if the token exists, a error message is sent out and ask users...
Remove the unk tensor and allow users to add one if necessary.