olm-datasets icon indicating copy to clipboard operation
olm-datasets copied to clipboard

CC data Language Splits

Open KeremTurgutlu opened this issue 1 year ago • 3 comments

Thanks a lot for putting this repo together and providing the fresh CC dumps at HF. I was looking for a way to find dataset splits for other languages but couldn't find a way to do it. Are datasets olm/olm-CC-MAIN-* monolingual by chance?

KeremTurgutlu avatar Mar 02 '23 06:03 KeremTurgutlu