RedPajama-Data icon indicating copy to clipboard operation
RedPajama-Data copied to clipboard

Recommended way to load wget-downloaded data using HF datasets API?

Open zijwang opened this issue 1 year ago • 1 comments

I downloaded the data following the instruction here. Is there a recommended way that I can load it via HF API similar to this?

zijwang avatar Jan 16 '24 19:01 zijwang

Hi @zijwang , my guess is that you can use the RPv2 data loader script here and modify the _URL_BASE variable to match the base directory on your filesystem. You should then be able to pass your data loading script to datasets.load_dataset (here is an explanation about this).

mauriceweber avatar Jan 18 '24 09:01 mauriceweber