opensearchserver icon indicating copy to clipboard operation
opensearchserver copied to clipboard

Automatic Language detection

Open Mojster opened this issue 8 years ago • 3 comments

Hi,

I've put the URL [(http://www.sicris.si/public/jqm/memo.aspx?lang=slv&opdescr=faq&source=evaluation.inc&opt=3&subopt=7)] into Manual crawl. Automatic language detection stated: Lang: cs It should be sl - Slovenian and not Czech.

Mojster avatar Aug 11 '16 06:08 Mojster

Found in your FAQ an article: How the lang attribute of webpages gets detected

So the fallback with content detection is not working properly. We'll try to solve this with language params on our test site and see how this works out.

Mojster avatar Aug 12 '16 05:08 Mojster

One option is to put language param in HTML documents. So than it detects SL. But in results it returns English as first result.

I think I could solve this with language param in query. But it does not contain Slovenian.

Is there a possibility to add this?

Mojster avatar Jan 03 '17 12:01 Mojster

Let me turn my question around. Would you add Slovene language to your "ngram detection"?

In issue #1822 you gave me once instructions how to add a slovene lemantizer. This I did. But how can I use it and if I'm right, this is not connected with the "ngram detection"?

Mojster avatar Feb 20 '18 10:02 Mojster