pluiefox comments

Results 12 comments of


pluiefox

No documentation on how use nllb

@314esther @Suhail Hi, You can check [here](https://github.com/pluiez/NLLB-inference) for a convenient script to run the model inference from the command line without having to dealing with the config files.

No documentation on how use nllb

> Hi, sorry I didn't take this into consideration. I'm assuming these tools are all pre-installed. I will list the required steps before running the script.

No documentation on how use nllb

@amrrs Thank you for your sharing. Actually I hard-coded the language passed to normalize_punctuation.sh in translate.sh as zho_Hans. Although many languages share English(en) normalization under the hood, Tamil uses Hindi(hi)....

NLLB vocabulary missing common Chinese character/tokens

> I thought nllb uses a byte-level sentencepiece. Am I wrong? Is the dict you talked about is this https://dl.fbaipublicfiles.com/large_objects/nllb/models/spm_200/dictionary.txt ? > > Since it is a byte-level dictionary, there...

NLLB vocabulary missing common Chinese character/tokens

> Thank you for your nice explanation! Does this mean that the model may need fintuning on a extended vocabulary including the missing byte chars to fix this problem?

NLLB inference time model loading fails due to inconsistent vocabulary size

> There are three additional tokens in the vocabulary that we add during training. Here is a response related to this : > > [huggingface/transformers#18043 (comment)](https://github.com/huggingface/transformers/issues/18043#issuecomment-1179317930) > > More specifically...

pluiefox

No documentation on how use nllb

No documentation on how use nllb

No documentation on how use nllb

NLLB vocabulary missing common Chinese character/tokens

NLLB vocabulary missing common Chinese character/tokens

NLLB inference time model loading fails due to inconsistent vocabulary size

NLLB inference time model loading fails due to inconsistent vocabulary size

Average time for prediction

Modification to fairseq (nllb branch) required to run this code

Character translation limit?