langdetect issues

LangDetectException - 'Need to load profiles.'

8

Hi, I sometimes get a LangDetectException which tells me: 'Need to load profiles.' Is there a way to check if all languages have been loaded before calling the detect method?...

edsvisinditus

Language detection not accurate for French text

When I try using `detect` on French text, the language detection is way off. For example: bonjour -> 'hr' (croatian) je m'appelle -> 'sl' (slovenian)

topspinj

Timeout to abort if too long

in networking time constraint exists, thus only x time for detect. Not only does this take `timeit.timeit(lambda: detect("War doesn't s left."), number=1000)` 34s it goes also to 'nl' instead eng....

ghost

Inaccurate predictions for basic english words

1

library is unable to detect language for basic english words and hence generates poor inaccurate results as depicted below. `detect("sunday")` => 'id' | whereas clearly 'sunday' in indonesian is minggu...

grestonian

Added ability to limit the languages to check for

2

Added a list as language limitation for load_profiles. Also implemented in `detect(text, languages=[])` and `detect_langs(text, languages=[])`. Auto reloading the `_factory` when the language selection changes.

whnr

[Feature request] Streaming or file object support in python bindings

The original langdetect in C++ has a very nice "early abort" efficiency optimization. Could "detect" accept some form of lazy-loading (I'd suggest being able to pass a python file object),...

za3k

detect confidence of a single language

1

e.g. If I solely want the confidence of english detect(text,'en') is this possible?. May just fork and add this feature. I realize it is a non-deterministic, possibly softmax output but...

negfrequency

remove Chinese charactor from profiles/ko, to make sure Chinese is not mistakenly be detected as Korean

2

In: langdetect.detect(u'就了快速大幅') Out: 'ko' But the string is definitely Chinese. The problem is that, there are so many Chinese character in profiles/ko So, I remove it using the script fix-ko.py

veelion

Added Kurdish dialects profiles

Profiles generated from wikipedia abstracts ([kuwiki-20170520-abstract.xml](https://dumps.wikimedia.org/kuwiki/20170520/kuwiki-20170520-abstract.xml) and [ckbwiki-20170520-abstract.xml](https://dumps.wikimedia.org/ckbwiki/20170520/ckbwiki-20170520-abstract.xml) for Sorani Kurdish (ku) and Central Kurdish (ckb) languages respectively)

rqx

Added si (Sinhalese-සිංහල) profile to langdetect

Added si (Sinhalese) profile to the langdetect. Tested for functionality.

dinal24

langdetect
langdetect copied to clipboard

Metadata

LangDetectException - 'Need to load profiles.'

Language detection not accurate for French text

Timeout to abort if too long

Inaccurate predictions for basic english words

Added ability to limit the languages to check for

[Feature request] Streaming or file object support in python bindings

detect confidence of a single language

remove Chinese charactor from profiles/ko, to make sure Chinese is not mistakenly be detected as Korean

Added Kurdish dialects profiles

Added si (Sinhalese-සිංහල) profile to langdetect

← Metadata

Owner

Metadata

langdetect langdetect copied to clipboard

Metadata

← Metadata

Owner

Metadata

langdetect
langdetect copied to clipboard