olm-datasets
olm-datasets copied to clipboard
Pipeline for pulling and processing online language model pretraining data from the web
Results
1
olm-datasets issues
Sort by
recently updated
recently updated
newest added
Thanks a lot for putting this repo together and providing the fresh CC dumps at HF. I was looking for a way to find dataset splits for other languages but...