Evgeny Pavlov

Results 185 issues of Evgeny Pavlov

# Experiment insights ## OpusCleaner - legacy cleaning slightly outperforms all OpusCleaner configs (likely due to num_mismatch filter in OpusCleaner) - large FastText model significantly reduces false positives compared to...

- Make sure that the integrated [OpusFilter](https://helsinki-nlp.github.io/OpusFilter/index.html) works - Produce configs with OpusFilter - Compare results to regular OpusCleaner based configs

quality

We are especially interested in publishing the full training live.log as a file to W&B artifacts or logs (wherever it will be more convenient to view it). This can be...

platform

This includes publishing: - live training logs to W&B dashboards I assume we'll have separate publishing scripts for other things. Let's use [Taskgraph transforms](https://taskcluster-taskgraph.readthedocs.io/en/latest/concepts/transforms.html) not to pollute Taskcluster kinds with...

platform

Some weird things I noticed in https://wandb.ai/moz-translations/lt-en: - teacher-ensemble evals is empty - group logs doesn't have any metrics - group logs is missing for some groups - quantized is...

bug
platform

I have to cancel tasks separately now.

taskcluster
tc-p1

Issues with the current implementation: - We use naive tokenization because it's what OpusTrainer requires. This might produce alignments of lower quality because we don't take into account punctuation and...

enhancement
quality