Config files for dolma 1.7
Hi,
are the config files for creating dolma 1.7 accessible somewhere or are you able to share them here?
Some of the finer points of the cleaning are not quite easy to replicate for me, such as the fuzzy deduplication.
Hi! Thanks for the question. We’re currently working on closing out old tickets, and we apologize that we didn’t get to you in a timely fashion. We’re closing this out for now, but if you’d still like an answer, please re-open and we will get back to you!
Hi,
I might have missed some more explanation or the config files being added, but would gladly still get some help or a simple config file example that mimics the filtering for the official dolma datasets.