elasticsearch-analysis-baseform icon indicating copy to clipboard operation
elasticsearch-analysis-baseform copied to clipboard

Case sensitive

Open leabaertschi opened this issue 10 years ago • 4 comments

I'm using this plugin for german text and it seems that it's case sensitive. Is that the case? If yes, what's the reason for that?

leabaertschi avatar May 19 '14 09:05 leabaertschi

In german there are words with different meaning when written upper- or lowercase (not many, only a few)

Example:

Rasen = grass rasen = to dash, to rush

jprante avatar May 19 '14 10:05 jprante

yeah, that's true :S. How hard would it be to adapt the files for german and build the plugin ourselves?

leabaertschi avatar May 19 '14 11:05 leabaertschi

Just fork and feel free to modify https://github.com/jprante/elasticsearch-analysis-baseform/tree/master/src/main/resources to your requirements ;-)

N.B. for lowercasing (with some ambiguities), you could simply combine this baseform analyzer with a lowercase filter.

jprante avatar May 19 '14 11:05 jprante

Actually, our problem is that users might enter the searchstring all lowercase and that it then cannot convert it into its base form. The second problem is that we use this plugin in combination with the decompound plugin which returns the tokens in lowercase and we have cases, where for some reason it does not return the tokens in their base form. E.g. Fleischtomaten converts into fleisch and tomate, but Datteltomaten converts to dattel and tomateN and the baseform plugin can then not convert tomaten into its base form because it's lowercase.

leabaertschi avatar May 19 '14 12:05 leabaertschi