olm-datasets icon indicating copy to clipboard operation
olm-datasets copied to clipboard

Pipeline for pulling and processing online language model pretraining data from the web

Results 1 olm-datasets issues
Sort by recently updated
recently updated
newest added

Thanks a lot for putting this repo together and providing the fresh CC dumps at HF. I was looking for a way to find dataset splits for other languages but...