langid.py icon indicating copy to clipboard operation
langid.py copied to clipboard

Stand-alone language identification system

Results 29 langid.py issues
Sort by recently updated
recently updated
newest added

Hello, It seems that `langid` uses multiprocessing under the hood to make classification faster. Is there any way in Python to force `langid` to use a single process (turn off...

The text `"Bestias desagradables. No sé por qué acepté apostar"` should be classified as Spanish, but it is instead classified as Portuguese with high confidence. If you remove the accents,...

when l detect ”Hello China" print(langid.classify(”Hello China")) the result : ('it', -37.309250354766846) @Paczesiowa @pquentin @martinth @jnothman @saffsd

I have the following text that is a mix of `english` and `sesotho`: ``` >>>Ska rebona re phela\nKgale re sokola rona re phelela mmino\nO skang potja ka dilo\nKgale re sokola...

Hi, I just stumbled over langid and then, when trying how suitable it'd be for my purposes, stumbled over this: ``` ❯ echo 'در' | langid -l ar,fa,ota Traceback (most...

When running batch training with -d flag, the following error outputs: line 585, in main writer.writerow(['path']+nb_classes) NameError: name 'nb_classes' is not defined Looks like there is a misplaced variable assignment....

From the readme files, I found out how to train a brand new model. Can we add new corpus to the default model rather than train a brand new model?

在Windows本机(python3.6)上运行么得问题,在Ubuntu服务器(python3.7)上报如下错误: `Traceback (most recent call last): File "lan_det.py", line 9, in print(lan_det(text)) File "lan_det.py", line 6, in lan_det return langid.classify(text) File "/home/env/rfh_01/lib/python3.7/site-packages/langid-1.1.6-py3.7.egg/langid/langid.py", line 105, in classify load_model() File "/home/env/rfh_01/lib/python3.7/site-packages/langid-1.1.6-py3.7.egg/langid/langid.py", line...

if wordn 3-gram is set in tokenize.py, the unit of max_order in DFfeatureselect.py is word or byte?Because in some langs, one string takes up several bytes.

![image](https://user-images.githubusercontent.com/5344333/75847841-52c9e800-5e06-11ea-9a9c-2f6d43041e7b.png) Any plans to add support for it?