Evgeny Pavlov issues

Results 185 issues of


                                            Evgeny Pavlov

[Experiment] Data cleaning Apr 2024

# Experiment insights ## OpusCleaner - legacy cleaning slightly outperforms all OpusCleaner configs (likely due to num_mismatch filter in OpusCleaner) - large FastText model significantly reduces false positives compared to...

Test OpusFilter with OpusCleaner

- Make sure that the integrated [OpusFilter](https://helsinki-nlp.github.io/OpusFilter/index.html) works - Produce configs with OpusFilter - Compare results to regular OpusCleaner based configs

quality

Publish task logs from Taskcluster

We are especially interested in publishing the full training live.log as a file to W&B artifacts or logs (wherever it will be more convenient to view it). This can be...

platform

Publish experiment config from Taskcluster

platform

Publish Marian config from Taskcluster

platform

Publish evals from Taskcluster

platform

Publish training charts from Taskcluster

This includes publishing: - live training logs to W&B dashboards I assume we'll have separate publishing scripts for other things. Let's use [Taskgraph transforms](https://taskcluster-taskgraph.readthedocs.io/en/latest/concepts/transforms.html) not to pollute Taskcluster kinds with...

platform

Issues with uploaded experiments

Some weird things I noticed in https://wandb.ai/moz-translations/lt-en: - teacher-ensemble evals is empty - group logs doesn't have any metrics - group logs is missing for some groups - quantized is...

bug

platform

"Cancel all" action doesn't work

I have to cancel tasks separately now.

taskcluster

tc-p1

Improve implementation of alignments

Issues with the current implementation: - We use naive tokenization because it's what OpusTrainer requires. This might produce alignments of lower quality because we don't take into account punctuation and...

enhancement

quality