nlp_primitives icon indicating copy to clipboard operation
nlp_primitives copied to clipboard

Add Language primitive

Open gsheni opened this issue 2 years ago • 2 comments

  • We can use this library to detect the language of a text
    • https://github.com/facebookresearch/fastText/
    • https://fasttext.cc/docs/en/language-identification.html
import fasttext
filepath = os.path.join(FILE_PATH, "../data/lid.176.ftz")
model = fasttext.load_model(filepath)
predictions = self.model.predict(text="hej", k=2) # returns top 2 matching languages
predictions

The object returned by the model is of the form ((‘__label__pl’, ‘__label__sv’), array([0.40688798, 0.23321952])) of <class 'tuple'> where pl and sv are the ISO 639 code for Polish and Swedish. The prediction for both languages is correct as Hej means Hello in both languages. The second part indicates the respective confidence of the sentence belonging to those languages.

gsheni avatar May 11 '22 21:05 gsheni

Can I pick this up?

sbadithe avatar Jul 02 '22 18:07 sbadithe

Yes, we can tackle this after the other, currently assigned NLP primitives

gsheni avatar Jul 02 '22 22:07 gsheni