Evgeny Pavlov
Evgeny Pavlov
# Experiment insights ## OpusCleaner - legacy cleaning slightly outperforms all OpusCleaner configs (likely due to num_mismatch filter in OpusCleaner) - large FastText model significantly reduces false positives compared to...
- Make sure that the integrated [OpusFilter](https://helsinki-nlp.github.io/OpusFilter/index.html) works - Produce configs with OpusFilter - Compare results to regular OpusCleaner based configs
We are especially interested in publishing the full training live.log as a file to W&B artifacts or logs (wherever it will be more convenient to view it). This can be...
This includes publishing: - live training logs to W&B dashboards I assume we'll have separate publishing scripts for other things. Let's use [Taskgraph transforms](https://taskcluster-taskgraph.readthedocs.io/en/latest/concepts/transforms.html) not to pollute Taskcluster kinds with...
Some weird things I noticed in https://wandb.ai/moz-translations/lt-en: - teacher-ensemble evals is empty - group logs doesn't have any metrics - group logs is missing for some groups - quantized is...
Issues with the current implementation: - We use naive tokenization because it's what OpusTrainer requires. This might produce alignments of lower quality because we don't take into account punctuation and...