nlp_primitives Add Language primitive

We can use this library to detect the language of a text
- https://github.com/facebookresearch/fastText/
- https://fasttext.cc/docs/en/language-identification.html

import fasttext
filepath = os.path.join(FILE_PATH, "../data/lid.176.ftz")
model = fasttext.load_model(filepath)
predictions = self.model.predict(text="hej", k=2) # returns top 2 matching languages
predictions

The object returned by the model is of the form ((‘__label__pl’, ‘__label__sv’), array([0.40688798, 0.23321952])) of <class 'tuple'> where pl and sv are the ISO 639 code for Polish and Swedish. The prediction for both languages is correct as Hej means Hello in both languages. The second part indicates the respective confidence of the sentence belonging to those languages.

May 11 '22 21:05 gsheni

Can I pick this up?

Jul 02 '22 18:07 sbadithe

Yes, we can tackle this after the other, currently assigned NLP primitives

Jul 02 '22 22:07 gsheni