Evgeny Pavlov
Evgeny Pavlov
It's basically models, logs and experiments from the [snakemake directory structure](https://mozilla.github.io/firefox-translations-training/snakemake.html#directory-structure): ``` gsutil ls gs://releng-translations-dev gs://releng-translations-dev/data/ gs://releng-translations-dev/experiments/ gs://releng-translations-dev/logs/ gs://releng-translations-dev/models/ ``` We use `/data` to store custom datasets, unlike for snakemake,...
Yes, it looks correct overall, some additions: - let's not forget about `quantize` and `evaluate quantized`, they also produce the model and evaluation results - instead of `retrain` let's use...
The old config is yaml, so let's upload the yaml from Taskcluster instead of json. Old en-ru experiment: `gs://releng-translations-dev/experiments/en-ru/ny-retraining/config.yml` Both train.log and live.log are useful. Vocab is also needed even...
> > Vocab is also needed even though we now also store it as model artifacts. > > And it belongs in directories like `models/en-ru/retrain1_/vocab` ? Correct > > ```...
> > > I did include evaluate-quantized - I assumed that was what ended up in the speed directory - is that wrong? > > > > > > Yes,...
We should update to 3.0, so closing in favour of #528. Also, we're already using the models from HF.
Very good idea! I've been thinking about it too.
We run the pipeline by dataset so the filtered sentences will likely be an artifact of the clean step. What would be useful is seeing what was filtered by each...
Agreed. Just a note that we already download bicleaner-ai models from hugging face using `bicleaner-ai-download` tool that uses their lib to pull the data.
It's also interesting that the same job has completed successfully on restart. https://firefox-ci-tc.services.mozilla.com/tasks/d95IBhOiS0OYp5LR69Sk6w/runs/0/logs/public/logs/live.log It was likely a temporary issue with FastText model downloading. If OpusCleaner properly failed on error Taskcluster...