langid.py icon indicating copy to clipboard operation
langid.py copied to clipboard

Stand-alone language identification system

Results 29 langid.py issues
Sort by recently updated
recently updated
newest added

Hello, Could someone tell me how to normalize the values in case we work on a notebook? I would like an equivalent of "python langid.py -n ..." for the notebook...

There is a small typo in langid/train/DFfeatureselect.py. Should read `overridden` rather than `overriden`.

Is it possible to show the trained data model in https://raw.githubusercontent.com/saffsd/langid.py/master/langid/langid.py as a pure JSON file for easy porting to other libraries that does the same thing?

Where did you get the data from? And what languages are covered and by what ratio? - JRC-Acquis - ClueWeb 09 - Wikipedia - Reuters RCV2 - Debian i18n

Solve the following scenarios: In the process of identification, the scope of the language can be limited.

Is there a way to get a list of supported/currently set languages? I'm thinking programatically, like: import langid print(langid.get_languages() >>> ['af', 'am', 'an', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br',...

As written in README, langid.py comes pre-trained on 97 languages. How could I reproduce the conclusion? I gave a try for UG language, but it told me it's ZH. I...

It's understandable that performance for very short strings is poor. Could we create a mapping with hand-assigned weights for those? I believe strings like 'yeah', 'no', 'si', 'haha', 'hehe' and...

For example, the sentence "Presidencia de la República - Mexico", the word "de" will be classified as "en", but if i change it to " de ", as add space...