Jaume Zaragoza comments

Results 124 comments of


                                            Jaume Zaragoza

bicleaner-ai-classify intermittently fails to download fasttext model

I checked deleting the fasttext model from the filesystem and Bicleaner AI triggers the download only when hardrules is enabled. This probably happened when translations still had hardrules enabled. So...

evaluate-quantized step fails in CI

This is probably failing because it is using binary shortlist parameter with a shortlist in text format ``` ... --shortlist /builds/worker/fetches/lex.s2t.pruned false ... ``` and will be fixed by #1169

[Experiment] Chinese Traditional

So, regarding mono hplt this seems to be the cleaning summary: ``` 22735836 HPLT_LID_SEGMENT 29798622 DUPLICATE 2028464 CLEAN_LID 3920256 RATIO_ALPHA 3683762 RATIO_CHARS 11074337 TOO_LONG ``` things that I think may...

[Experiment] Chinese Traditional

Ok, so I took a look at parallel and it's a bit difficult to tell what's happening with the cleaning without debugging log for filters. Probably it was in part...

Extend supported models for other translators

I don't have a list. There is an extensive list here: https://opus.nlpl.eu/mt/release-history, but that does not give us the "best" model.

Show the source sentence in the evals UI

oh, sorry, I didn't remember this was due to the licenses. Could we somehow add a link to the dataset viewer? @evgenyrp

Show the source sentence in the evals UI

Maybe even a link to the exact row in the viewer? https://huggingface.co/datasets/facebook/bouquet/viewer/spa_Latn?views[]=spa_latn_dev&row=2

Possible inconsistencies in dashes with OpenSubstitles

I opened the issue for the record, but it seems that the dash issues are just frequent on old models with the short sentences issues, so closing it for now.

Enable fp16 training

Throughput has gone up from 80k tok/s to 130 tok/s for teacher training.

We need filter debugging for OpusCleaner

Maybe something like [this tee](https://github.com/hplt-project/OpusCleaner/blob/5fef45344decce1275b3c1a60ec09cda61d478a4/opuscleaner/clean.py#L313) that counts lines at the beginning and at the end of each step. Or using enabling that tee option, then count each step size.