David Dale

Results 74 comments of David Dale

In my experience, when running the commend locally, it might be also needed to export the `USER` environment variable, to please Hydra. So the full command could look like ```...

> What should I indicate in tgt_lang for the unseen language? You can assign any name you want to the new language. If this name is `abc`, then you will...

Hi @gegallego, thank you for this remark! Yes, indeed, the LayerNorm implementation that is used here computes the variance without the Bessel's correction (https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html#torch.nn.LayerNorm), so should we. I'll update the...

The NLLB model has been trained to translate not words, but sentences. This is a tradition in machine translation. Texts longer than a sentence used to be too difficult for...

All the NLLB models were trained mostly on single-sentence translation, and they are by no means guaranteed to correctly translate multiple-sentence texts. Thus, the safest recommendation is to split the...

I found similar problems in several more lines. E.g. line 1143 (668 if newlines in the texts are quoted) in tsd_trial.csv is `"[5, 6, 7, 8, 9, 10, 11, 12,...

Hi @Nayjest! Could you please compute the numbers yourself and submit a pull request with them?

Hi @potat-dev ! Could you please compute the scores on your own and add them as a pull request to this repo?

I am curious, which of the tasks needs so much memory? Is it possible to optimize it? SONAR text encoder itself occupies only about 3GB on disk. With a small...

I did evaluate SONAR encoders (with a few assumptions about language codes). However, even after adding the output of `scripts/mteb_meta.py` to [the model's readme](https://huggingface.co/facebook/SONAR), I cannot see SONAR on the...