reminiscence icon indicating copy to clipboard operation
reminiscence copied to clipboard

Support keywords extraction for other current languages

Open stephane-martin opened this issue 6 years ago • 1 comments

Hello,

currently it seems that in the keywords extraction process, stop words are hard coded to be for English language. Thus, when archiving content in some other language, the selected keywords are very often stop words in that language (I mainly archive content in French...)

Maybe the list of stop words could be selected dynamically, based on automatic language detection ? (see https://github.com/Mimino666/langdetect for example)

Thanks for great product :)

stephane-martin avatar Dec 19 '18 21:12 stephane-martin

Yes, currently only english language is supported. I'll try to look into supporting other languages as well.

kanishka-linux avatar Dec 20 '18 04:12 kanishka-linux