WenJett
WenJett
Hi @aman-17, I have also asked about the same issue at dolma GitHub: https://github.com/allenai/dolma/issues/225 to which @soldni has kindly responded but we can't identify the problem. I have also uploaded...
Hi @soldni, [data.json.gz](https://github.com/user-attachments/files/18583143/data.json.gz) I have uploaded the data.json.gz (as above) that I have been testing the pipeline with, hence it is only ~10 data points which resulted in "unable to...
Hi @soldni, I checked my toolkit version it is the same as yours 1.0.14.post1. I have also tried updating the toolkit to version 1.1.0 but does not resolve the issue....
My output is as per what is in the 'text' field which I pasted below. I do not get any error at all. edited to show one full "text" output...