nlp_primitives
nlp_primitives copied to clipboard
Add Language primitive
- We can use this library to detect the language of a text
- https://github.com/facebookresearch/fastText/
- https://fasttext.cc/docs/en/language-identification.html
import fasttext
filepath = os.path.join(FILE_PATH, "../data/lid.176.ftz")
model = fasttext.load_model(filepath)
predictions = self.model.predict(text="hej", k=2) # returns top 2 matching languages
predictions
The object returned by the model is of the form ((‘__label__pl’, ‘__label__sv’), array([0.40688798, 0.23321952])) of <class 'tuple'> where pl and sv are the ISO 639 code for Polish and Swedish. The prediction for both languages is correct as Hej means Hello in both languages. The second part indicates the respective confidence of the sentence belonging to those languages.
Can I pick this up?
Yes, we can tackle this after the other, currently assigned NLP primitives