OpusCleaner
OpusCleaner copied to clipboard
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
Now I'm actually using the thing, I'm noticing an urge to document why I'm adding filters and why I set the parameters as I set them. But there's no proper...
It would be great to have full examples of the filters that were used to train a language pair. This would simplify the usage of the tool by non-expert users.
Hey, thank you for the great work as always! We're looking into integrating OpusFilter in Firefox Translations training pipeline! Our workflow is quite automated and most likely we'll keep our...
Workflow
Sorry bad title need to jot down some notes. Empty-train workflow, long version (maybe you can skip steps?) 1. Select datasets 1. Download each dataset 2. Generate samples 2. Select...
If you have many datasets, you'd want to apply the same filter steps to quite a couple of them I suspect. It would be helpful if we can provide some...
I downloaded a dataset, CCAligned-v1.en-mt. It has 37 sentences and maybe 2 are correct. How do I mark it as "do not use"?
I was surprised to learn that the `langid` filter is CLD2 and the `fasttext_filter` is fasttext langid. From a UI perspective, its better to have one. Aside, can we add...
When I click on it, nothing happens.
1. Run fresh opuscleaner 2. Load http://localhost:8000 and get the empty folder (expected) 3. Click import dataset, select Maltese and English, click way too many buttons to download everything. 4....
Right now, opuscleaner is tricky to install because it will pull in all the dependencies for most filters. I'm tempted to remove most of the external filters (opusfilter, bicleaner, etc)...