OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Expected Data Format

Open aflah02 opened this issue 1 year ago • 1 comments

❓ The question

I was looking at the config files and noticed that the config files sometimes point to .npy files for the dataset. Is there any script to generate the same from a set of text files or any other format.

aflah02 avatar Aug 27 '24 12:08 aflah02

You can use Hugging Face to download the dataset directly or the Dolma toolkit. The Hugging Face repository provides easy access to the dataset, and the Dolma toolkit offers utilities to handle different data format. If you need further help, feel free to follow up.

aman-17 avatar Oct 19 '24 23:10 aman-17

Hi, thanks again for the inquiry! We’re currently working on closing out old tickets, so we’re closing this out for now, but if you require a follow-up response, please re-open and we will get back to you!

baileykuehl avatar Jul 01 '25 17:07 baileykuehl

Thanks Sorry I missed replying earlier

aflah02 avatar Jul 17 '25 17:07 aflah02