Jaume Zaragoza issues

Results 53 issues of


                                            Jaume Zaragoza

Add Abkhaz

I found this [corpus](https://huggingface.co/datasets/Nart/abkhaz_text) coming from the Abkhazian National Corpus and Common Voice. So probably it won't have any language pollution and can be used for training. I [asked](https://huggingface.co/datasets/Nart/abkhaz_text/discussions/3) just...

`lr-decay-strategy epoch+stalled` not working

### Bug description `lr-decay-strategy epoch+stalled` does not decay the learning rate after stalled validation. ### How to reproduce Set `--lr-decay 0.5 --lr-decay-strategy epoch+stalled --lr-decay-start 1 1` and wait until one...

bug

Decrease minimum HPLT document score

Training Thai I noticed we just got 3M sentences for backtranslations and similar happened to me before with other languages. So I decided to suggest this change and avoid a...

Show the source sentence in the evals UI

When looking at individual sentence scores, it's difficult to tell if the translation is correct without the source, or to guess what's the source of the error. Showing the reference...

evals

Model fine-tuning support

If at some point we want this to be supported, I think it should be sketched a little bit. I also had some ideas that I don't want to forget....

quality

taskcluster

We need filter debugging for OpusCleaner

Specially when running complicated language pairs that may not be well supported and suffer a lot from filtering ([like](https://github.com/mozilla/translations/pull/1288#issuecomment-3559677897) Chinese Traditional), we need a detailed description of how much data...

quality

Language identification remaining tasks

- [ ] Update monolingual claning to use newest LID tools. - [ ] Short report of 100 languages. - [ ] Choose LID tool based on target language.

quality

Experiment with student model parameters (part 2)

(continuation of #894 ) If we want to continue experimenting with student parameters, there are still combinations that could try. - [ ] More combinations of the parameters that Greg...

quality

experiment

Enable fp16 training

FP16 training can increase throughput a lot and may not hit quality. I'm testing it.

cost & perf

Possible inconsistencies in dashes with OpenSubstitles

As we discussed yesterday, it's possible that `- ` dashes at the beginning of sentences in OpenSubstitles are inconsistent and [may cause extra tokens in the output](https://github.com/mozilla/translations/issues/215#issuecomment-3327405949). We discussed that...

good first issue

quality