elasticsearch-analysis-baseform
elasticsearch-analysis-baseform copied to clipboard
Case sensitive
I'm using this plugin for german text and it seems that it's case sensitive. Is that the case? If yes, what's the reason for that?
In german there are words with different meaning when written upper- or lowercase (not many, only a few)
Example:
Rasen = grass rasen = to dash, to rush
yeah, that's true :S. How hard would it be to adapt the files for german and build the plugin ourselves?
Just fork and feel free to modify https://github.com/jprante/elasticsearch-analysis-baseform/tree/master/src/main/resources to your requirements ;-)
N.B. for lowercasing (with some ambiguities), you could simply combine this baseform analyzer with a lowercase filter.
Actually, our problem is that users might enter the searchstring all lowercase and that it then cannot convert it into its base form. The second problem is that we use this plugin in combination with the decompound plugin which returns the tokens in lowercase and we have cases, where for some reason it does not return the tokens in their base form. E.g. Fleischtomaten converts into fleisch and tomate, but Datteltomaten converts to dattel and tomateN and the baseform plugin can then not convert tomaten into its base form because it's lowercase.