Guanheng George Zhang
Guanheng George Zhang
We plan to eventually retire `Field` class as legacy code. However, at this moment, we could land a OSS PR as the example to help the usage case above. @M-e-r-c-u-r-y
If you have the paths of train/test files for AG_NEWS and DBpedia, you could save them as a list and start from [here](https://github.com/pytorch/text/blob/43acc7534738ca6af91c19524667738d4b3f1fa3/torchtext/datasets/text_classification.py#L118). So why not just call the AG_NEWS...
It's an iterator so I don't think you can split/shuffle it. I think, it's worth an option to set up the offset or the beginning of line. So for the...
As a temporary solution, you can list all the items from the iterator and split them (if the dataset can fit the memory). ``` train_data = list(train_iter) ```
> Thanks! but i think if we did that we can not get it back to `_RawTextIterableDataset`, right? I don't understand it. Here, we just cache the iterator. Search `train_list...
Here are a few python wrappers: - `encode_as_pieces` tokenizes a sentence into a list of tokens - `encode_as_ids` tokenizes a sentence into a list of tokens. In general, this wrap...
@PetrochukM just to check in and see which kind of datasets you would propose.
@eedeleon Yes. There will be a release by the end of July (0.4.0). We are now planning to incorporate some common NLP models in torchtext (like torchvision) to support the...
@PetrochukM Thanks for the comments. For the next release, I will try to add a few new supervised learning dataset, a tutorial to construct dataset with new pattern. We still...
@fmassa how do you think about those pretrained models?