Guanheng George Zhang
Results
42
comments of
Guanheng George Zhang
I have this problem to construct enwik9 dataset. Based on the discussions in this issue, I have some ideas: - a list of sub-datasets - a list of byte offsets...
You might use offset to material part of your data which fit into your memory, like [here](https://github.com/pytorch/text/blob/master/torchtext/datasets/unsupervised_learning.py#L80). Do you use distributed data parallel for training the model? If so, you...