flatnotes some chinese character can not be searched

to produce, copy some texts in chinese character to notes, like this one: https://m.bjnews.com.cn/detail/154105774414080.html search for 感冒 returns no result. but 订阅 is ok。 version: docker:latest (0c7b3e7aaec0)

search for 感冒 search for 订阅

Oct 19 '22 04:10 0x9394

I think the problem here is that the search index is currently very targeted to the English language. When text is added to the index is goes through a number of steps:

Tokenisation - This breaks up the words into "tokens" (words to be indexed). Currently flatnotes uses a regex tokeniser (\w+(\.?\w+)*) which is fairly generic but may/may not work well with Chinese text. Lower Case Filtering - I can imagine this would be ok with any language. Accent Folding - This ensures that "café" is indexed as "cafe". Again, should work ok with any language. Stop Word Removal - This currently filters out common English words that are not useful for searches e.g. 'for', 'from' and 'have'. This should work ok with non-English languages but they wouldn't get the benefit this step is designed for. Stemming - This process tries to "normalise" related words. For exmaple, a note containing one of “render”, “rendered”, “renders” or “rendering” could be found by searching for any of those words. flatnotes currently uses the "Porter" stemming algorithm which is designed to remove suffixes from English words.

I'd love to be able to target other languages but I'd likely need to make some changes to the index and possible add options to it.

Oct 31 '22 11:10 dullage

I, too, would like other languages to be added. Searching in Korean does not bring up any results.

Jun 01 '23 01:06 Luxosity

Same issue

Nov 11 '23 09:11 kangfenmao