langid.py issues

normalize langid values when working on notebook

1

Hello, Could someone tell me how to normalize the values in case we work on a notebook? I would like an equivalent of "python langid.py -n ..." for the notebook...

Pala-beh

Fix simple typo: overriden -> overridden

Closes #75

timgates42

Fix simple typo: overriden -> overridden

There is a small typo in langid/train/DFfeatureselect.py. Should read `overridden` rather than `overriden`.

timgates42

JSON plaintext of data

Is it possible to show the trained data model in https://raw.githubusercontent.com/saffsd/langid.py/master/langid/langid.py as a pure JSON file for easy porting to other libraries that does the same thing?

DonaldTsang

Links to related corpus?

Where did you get the data from? And what languages are covered and by what ratio? - JRC-Acquis - ClueWeb 09 - Wikipedia - Reuters RCV2 - Debian i18n

DonaldTsang

Increase the characteristics of the specified language during the run

Solve the following scenarios: In the process of identification, the scope of the language can be limited.

tjdai

Get languages supported from module?

Is there a way to get a list of supported/currently set languages? I'm thinking programatically, like: import langid print(langid.get_languages() >>> ['af', 'am', 'an', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br',...

kidmose

Can you tell me how to identify those 97 languages?

2

As written in README, langid.py comes pre-trained on 97 languages. How could I reproduce the conclusion? I gave a try for UG language, but it told me it's ZH. I...

winter-loo

Hard-coded lookup for very short strings?

6

It's understandable that performance for very short strings is poor. Could we create a mapping with hand-assigned weights for those? I believe strings like 'yeah', 'no', 'si', 'haha', 'hehe' and...

bittlingmayer

a little situation

For example, the sentence "Presidencia de la República - Mexico", the word "de" will be classified as "en", but if i change it to " de ", as add space...

bingwork

langid.py
langid.py copied to clipboard

Metadata

normalize langid values when working on notebook

Fix simple typo: overriden -> overridden

Fix simple typo: overriden -> overridden

JSON plaintext of data

Links to related corpus?

Increase the characteristics of the specified language during the run

Get languages supported from module?

Can you tell me how to identify those 97 languages?

Hard-coded lookup for very short strings?

a little situation

← Metadata

Owner

Metadata

langid.py langid.py copied to clipboard

Metadata

← Metadata

Owner

Metadata

langid.py
langid.py copied to clipboard