olm-datasets
olm-datasets copied to clipboard
CC data Language Splits
Thanks a lot for putting this repo together and providing the fresh CC dumps at HF. I was looking for a way to find dataset splits for other languages but couldn't find a way to do it. Are datasets olm/olm-CC-MAIN-*
monolingual by chance?