Guanheng George Zhang

Results 42 comments of Guanheng George Zhang

We plan to eventually retire `Field` class as legacy code. However, at this moment, we could land a OSS PR as the example to help the usage case above. @M-e-r-c-u-r-y

If you have the paths of train/test files for AG_NEWS and DBpedia, you could save them as a list and start from [here](https://github.com/pytorch/text/blob/43acc7534738ca6af91c19524667738d4b3f1fa3/torchtext/datasets/text_classification.py#L118). So why not just call the AG_NEWS...

It's an iterator so I don't think you can split/shuffle it. I think, it's worth an option to set up the offset or the beginning of line. So for the...

As a temporary solution, you can list all the items from the iterator and split them (if the dataset can fit the memory). ``` train_data = list(train_iter) ```

> Thanks! but i think if we did that we can not get it back to `_RawTextIterableDataset`, right? I don't understand it. Here, we just cache the iterator. Search `train_list...

Here are a few python wrappers: - `encode_as_pieces` tokenizes a sentence into a list of tokens - `encode_as_ids` tokenizes a sentence into a list of tokens. In general, this wrap...

@PetrochukM just to check in and see which kind of datasets you would propose.

@eedeleon Yes. There will be a release by the end of July (0.4.0). We are now planning to incorporate some common NLP models in torchtext (like torchvision) to support the...

@PetrochukM Thanks for the comments. For the next release, I will try to add a few new supervised learning dataset, a tutorial to construct dataset with new pattern. We still...