datatrove icon indicating copy to clipboard operation
datatrove copied to clipboard

How to look into the processed data?

Open shizhediao opened this issue 6 months ago • 3 comments

Hi,

After running tokenize_from_hf_to_s3.py, I would like to inspect the resulting data. But I find that the current data is in a binary file (.ds). is there a way to allow me to look into the data?

Thanks!

shizhediao avatar Aug 16 '24 16:08 shizhediao