text
text copied to clipboard
Models, data loaders and abstractions for language processing, powered by PyTorch
find_match searches a list of strings and returns first entry that partially or fully contains the given string match.
This PR changes the IMDB download to actually use the filename stored to detect whether the data has already been downloaded. This can further prevent unnecessary querying of google drive.
This is a prototype pretrained XLM-R model based on the RoBERTa encoder. There are a few features that we would like to highlight and collect feedback: - The basic nn...
In the [XLM-R](https://arxiv.org/pdf/1911.02116.pdf) model, SentencePiece is used to tokenize the strings. We enable the sentencepiece processing pipeline here for the BERT workflow.
On top of https://github.com/pytorch/text/pull/1027 Add `__setitem__` func to torchtext.experimental.vocab.Vocab. A `__delitem__` func is added as well. [RuntimeError] if the token exists, a error message is sent out and ask users...
Remove the unk tensor and allow users to add one if necessary.
This PR is to remove the default `''` token along with the index from `experimental.vocab`. Fix https://github.com/pytorch/text/issues/1016 In the experimental vocabulary, there will be no special symbols or user reserved...
Begin trying to use the new interface in https://github.com/pytorch/pytorch/pull/45645