Guanheng George Zhang

Results 42 comments of Guanheng George Zhang

I have this problem to construct enwik9 dataset. Based on the discussions in this issue, I have some ideas: - a list of sub-datasets - a list of byte offsets...

You might use offset to material part of your data which fit into your memory, like [here](https://github.com/pytorch/text/blob/master/torchtext/datasets/unsupervised_learning.py#L80). Do you use distributed data parallel for training the model? If so, you...