Soyeon Kim

Results 6 issues of Soyeon Kim

hi, thank you in advance. I really like this theme. By the way, how can I change the font size of the page(posts, about and so on..) I only found...

One more question, please. using the provided command, how long does it take to finish the each step(e.g, quality filtering, deduplication, quality classifier) for processing single index of commoncrawl(e.g, 2023-06...

Hi, thank you in advance. I am facing with following error while using same command for processing commoncrawl in README. `python -m cc_net --dump 2023-06 --task_parallelism 20 --num_shards 5000 -l...

Hi, there Regarding to quality signal parts, the fasttext based model trained on wiki is only provided in README. Would it be possible to share palm version(books, wiki, owt) too?...

Hey, thank you in advance for your great work and sharing the data :) I read README and huggingface details and was unclear whether fuzzy deduplication is actually done on...

Hi, In this pipeline, the major step is as follows 1. quality filtering(cc-net) 2. deduplication 3. filter out by classifier(trained with sampled commoncrawl and wiki-text) my question is how each...