Vedaad Shakib

Results 1 issues of Vedaad Shakib

Hi, While downloading and processing Dolma v1.7, I noticed that there are many duplicate samples with the same `id` field in the dataset. E.g. in the `Project Gutenberg` source, there...