snowstorm icon indicating copy to clipboard operation
snowstorm copied to clipboard

Decompunding with compound languages

Open danka74 opened this issue 5 years ago • 1 comments

Dear All,

compound languages such as the Germanic and Scandinavian languages (German, Dutch, Swedish, Danish, Norwegian, Finish, ...) do not benefit from word-start searches as much as non-compound languages such as English.

e.g. English "alcohol abuse" Swedish "alkoholmissbruk" -> "alkohol-miss-bruk"

There are a number of decompounding projects on github which might be re-used when creating the description index, https://github.com/search?q=decompounding, not all of them actively maintained.

danka74 avatar Sep 14 '20 11:09 danka74

Great idea @danka74. The license of the library used is another consideration. Snowstorm currently uses Apache 2.0 so the library would have to be compatible with this. We welcome community collaboration on this.

kaicode avatar Sep 17 '20 11:09 kaicode