datatrove
datatrove copied to clipboard
How to look into the processed data?
Hi,
After running tokenize_from_hf_to_s3.py
, I would like to inspect the resulting data. But I find that the current data is in a binary file (.ds
). is there a way to allow me to look into the data?
Thanks!