fairseq
fairseq copied to clipboard
How to use Language Identification Model of NLLB ?
What is your question?
How to use the language identification model trained on Flores-200 (mentioned in the NLLB paper) ? Model is presented in the repo but the utility of the LID model via code is nowhere to be found.
Also, is there a hf implementation for this ?
(might also be of interest to @sheonhan)
+1
Very late, but for anyone interested – assuming you're asking about lid218e.bin model - you could use fasttext library:
import fasttext
fasttext_model = fasttext.load_model('lid218e.bin')
fasttext_model.predict("русский язык", k=3)
outputs:
(('__label__rus_Cyrl', '__label__tat_Cyrl', '__label__ukr_Cyrl'),
array([9.72893476e-01, 2.59862132e-02, 4.44931240e-04]))