offsite-tuning
offsite-tuning copied to clipboard
Usage of Pile dataset to train the emulator
Hi,
I noticed that you trained the NLP emulator with the first 30 chunks of Pile dataset. I wonder how large are the 30 chunks? Or in other words, how many chunks does Pile have? The original Pile dataset is over 800G, it is too big for the labs...
Besides, did you try to use smaller datasets, such as Wikitext? What is the performance of using these smaller datasets?
Thanks