pluiefox

Results 12 comments of pluiefox

@314esther @Suhail Hi, You can check [here](https://github.com/pluiez/NLLB-inference) for a convenient script to run the model inference from the command line without having to dealing with the config files.

> Hi, sorry I didn't take this into consideration. I'm assuming these tools are all pre-installed. I will list the required steps before running the script.

@amrrs Thank you for your sharing. Actually I hard-coded the language passed to normalize_punctuation.sh in translate.sh as zho_Hans. Although many languages share English(en) normalization under the hood, Tamil uses Hindi(hi)....

> I thought nllb uses a byte-level sentencepiece. Am I wrong? Is the dict you talked about is this https://dl.fbaipublicfiles.com/large_objects/nllb/models/spm_200/dictionary.txt ? > > Since it is a byte-level dictionary, there...

> Thank you for your nice explanation! Does this mean that the model may need fintuning on a extended vocabulary including the missing byte chars to fix this problem?

> There are three additional tokens in the vocabulary that we add during training. Here is a response related to this : > > [huggingface/transformers#18043 (comment)](https://github.com/huggingface/transformers/issues/18043#issuecomment-1179317930) > > More specifically...

@gmryu Really appreciate your explanation, now it's all clear to me.

That pretty much depends on what device and which checkpoint you use. The smallest checkpoint has 600M parameters, actually it's already quite large compared to some commonly used pretrained models....

Hi, I have updated the prerequisites before running the script, including sentencepiece command line tools.

Hi, buffer-size is simply set large enough to be greater than batch-size, it's not related to the generation result. Fairseq has a max_position=512 limit, maybe one of your input sentence...