Jaume Zaragoza

Results 124 comments of Jaume Zaragoza

That transliterator can work with mixed scripts perfectly.

Reagarding those old comments of mine, I think byte fallback solves most of those issues.

From what I understand on how SentencePiece works, increasing or setting to 1.0 the character coverage is just forcing the SentencePiece model to have all the characters included in the...

Is a `[usize; 65536]` too much for that histogram?

Has been any progress on supporting u16 C-style enums? I'm not familiar with the bitcode source code, but could try to implement it if it's easy and you point me...

This results seem very interesting to me. I believe the fact that NLLB and Paracrawl are full of redundant and repetitive data has something to do with this. If there...

I think the regex patterns without the anchors was the original intention of the filter. The idea of the filter I think it is to be simple and remove very...

I guess this is was because opuscleaner in filter mode runs several processes at the same time that write to the same file and the ones that start later [do...

The only thing that comes to my mind is extract-best using BLEU and BLEU being worse for Chinese than other languages, since it relies on some tokenization. I personally changed...

Maybe also: - Add pairwise statistical significance tests. For comet use `comet-compare` and for chrF use `sacrebleu --paired-ar`. EDIT: this will help a lot when a couple of models have...