Jaume Zaragoza
Jaume Zaragoza
Using `Tatoeba-MT-models/gmq-eng/opusTCv20210807+bt_transformer-big_2022-03-09` to translate WMT21 test set for Icelandic. The postprocess script ```bash #!/bin/bash # # USAGE postprocess.sh < input > output # sed 's/ //g;s/▁/ /g' ``` does not...
Hello, Is there any script or an easy way to use a pre-trained model to only encode sentences? Thanks!
Half of the English sentences are empty, is this expected? ```bash $ sacrebleu -t wmt21/dev -l is-en --echo ref | grep -c '^[[:blank:]]*$' 1004 $ sacrebleu -t wmt21/dev -l is-en...
The last time I've worked with this it was using [OpenCC](https://pypi.org/project/OpenCC/). It is much more up to date and seems to have an active community. Las release from hanziconv is...
He estat mirant la llista de paraules de @thomedes i entenc els motius pels quals esborrar les paraules amb caràcters no ascii, entenc que no aumenta l'entropia. En canvi, provant...
Noticed that in most of HPLT documents that CLD2 says it is Uzbek and are written in cyrillic, fasttext is saying that sentences are other cyrillic langs like `ru`, `kk`,...
Hi I had HQ tasks that were failing but could not debug their error messages because the HQ log file was reporting this: ``` hq log hq-processing.log summary ``` was...
Is support for piping input to each of the job's stdin, planned? like `--pipe` in GNU Parallel. Great project, thank you!
Are there any plans on supporting enums larger than u8? I'm currently developing a language identifier that has 238 variants for the language codes and will probably surpass that amount...
While I was adapting `cirrus-scripts` to Bitextor v8.3, I found that the output of `docjoin` right just before `bleualign` like https://github.com/bitextor/bitextor/blob/f0d5982ac07621dee7a6d0ad3d69ce1eab58a10f/bitextor/Snakefile#L1202-L1208 the 7th column, which are the paragraph indexes of...