TinyLlama icon indicating copy to clipboard operation
TinyLlama copied to clipboard

How do I use these data sets to train new models?

Open win10ogod opened this issue 1 year ago • 1 comments

How do I use these data sets to train new models? https://huggingface.co/datasets/Skywork/SkyPile-150B https://huggingface.co/datasets/EleutherAI/proof-pile-2

win10ogod avatar Jan 12 '24 02:01 win10ogod

@jzhang38 Can you provide a script? I'm a little confused on how to modify the script.

win10ogod avatar Jan 12 '24 02:01 win10ogod

Hi we are working on these two datasets, will release the scripts when we finish.

ChaosCodes avatar Feb 08 '24 14:02 ChaosCodes