text icon indicating copy to clipboard operation
text copied to clipboard

Models, data loaders and abstractions for language processing, powered by PyTorch

Results 221 text issues
Sort by recently updated
recently updated
newest added

find_match searches a list of strings and returns first entry that partially or fully contains the given string match.

cla signed

This PR changes the IMDB download to actually use the filename stored to detect whether the data has already been downloaded. This can further prevent unnecessary querying of google drive.

cla signed

This can reduce build time

cla signed

This is a prototype pretrained XLM-R model based on the RoBERTa encoder. There are a few features that we would like to highlight and collect feedback: - The basic nn...

cla signed

In the [XLM-R](https://arxiv.org/pdf/1911.02116.pdf) model, SentencePiece is used to tokenize the strings. We enable the sentencepiece processing pipeline here for the BERT workflow.

cla signed

On top of https://github.com/pytorch/text/pull/1027 Add `__setitem__` func to torchtext.experimental.vocab.Vocab. A `__delitem__` func is added as well. [RuntimeError] if the token exists, a error message is sent out and ask users...

cla signed

Remove the unk tensor and allow users to add one if necessary.

cla signed

This PR is to remove the default `''` token along with the index from `experimental.vocab`. Fix https://github.com/pytorch/text/issues/1016 In the experimental vocabulary, there will be no special symbols or user reserved...

cla signed

Begin trying to use the new interface in https://github.com/pytorch/pytorch/pull/45645

cla signed