fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

How to use Language Identification Model of NLLB ?

Open rcv-koo opened this issue 2 years ago • 3 comments

What is your question?

How to use the language identification model trained on Flores-200 (mentioned in the NLLB paper) ? Model is presented in the repo but the utility of the LID model via code is nowhere to be found.

Also, is there a hf implementation for this ?

rcv-koo avatar Jan 25 '23 16:01 rcv-koo

(might also be of interest to @sheonhan)

julien-c avatar Jan 26 '23 11:01 julien-c

+1

WilliamTambellini avatar Jan 26 '23 16:01 WilliamTambellini

Very late, but for anyone interested – assuming you're asking about lid218e.bin model - you could use fasttext library:

import fasttext
fasttext_model = fasttext.load_model('lid218e.bin')
fasttext_model.predict("русский язык", k=3)

outputs:

(('__label__rus_Cyrl', '__label__tat_Cyrl', '__label__ukr_Cyrl'),
 array([9.72893476e-01, 2.59862132e-02, 4.44931240e-04]))

fvolchyok avatar Nov 15 '23 13:11 fvolchyok