Cataclysm-BN icon indicating copy to clipboard operation
Cataclysm-BN copied to clipboard

feat(UI): korean and english fuzzy search

Open scarf005 opened this issue 1 year ago • 8 comments

Purpose of change

improve ergonomics of filtering items.

Describe the solution

lcmatch now uses std::wregex to fuzzy-search items. e.g mgzn -> m.*?g.*?z.*?n

this isn't the best fuzzy search algorithm, however it's not too complex and supports complex fuzzy searching in korean (e.g match '감' with either 'ㄱ' or '가')

Describe alternatives you've considered

using std::locale on language.cpp

diff
diff --git a/src/language.cpp b/src/language.cpp
index da7c57a00568..657252dd9541 100644
--- a/src/language.cpp
+++ b/src/language.cpp
@@ -2,6 +2,7 @@
 
 #include <algorithm>
 #include <fstream>
+#include <locale>
 
 #if defined(_WIN32)
 #  if 1 // Prevent IWYU reordering platform_win.h below mmsystem.h
@@ -158,23 +159,9 @@ static std::string getSystemUILang()
 // Linux / Android
 static std::string getSystemUILang()
 {
-    std::string ret;
+    const auto lang = std::setlocale( LC_ALL, std::locale( "" ).name().c_str() );
 
-    const char *language = getenv( "LANGUAGE" );
-    if( language && language[0] != '\0' ) {
-        ret = language;
-    } else {
-        const char *loc = setlocale( LC_MESSAGES, nullptr );
-        if( loc != nullptr ) {
-            ret = loc;
-        }
-    }
-
-    if( ret == "C" || string_starts_with( ret, "C." ) ) {
-        ret = "en";
-    }
-
-    return to_valid_language( ret );
+    return to_valid_language( lang );
 }
 #endif // _WIN32 / !MACOSX

filtering entries by regex relevance in uilist::filterlist

this would be a big change, thus deserves its own PR.

Testing

added string_fuzzy_search_test.cpp.

fuzzy search in english

https://github.com/cataclysmbnteam/Cataclysm-BN/assets/54838975/2577c8c6-efd9-42e4-963e-2d5dea38176a

fuzzy search in korean

https://github.com/cataclysmbnteam/Cataclysm-BN/assets/54838975/aec97fa9-9ef1-4a21-8c0a-9c47cce87bae

Additional context

reference: https://taegon.kim/archives/9919

scarf005 avatar Nov 26 '23 07:11 scarf005

Or, thinking about it, maybe it could perform fuzzy search when string starts with some special character. As adding toggles to all places expecting text must be pretty difficult. e.g. "can" - regular, "~can" - fuzzy.

Vollch avatar Dec 03 '23 00:12 Vollch

or maybe we could improve the logic to make lcmatch return 'certainty score', and sort the result accordingly.

scarf005 avatar Dec 04 '23 00:12 scarf005

Right, that would also work. In fact, you don't really need to bother with scoring - instead you can rely on simple assumption that strict matches are most accurate. So, make two list - strict and fuzzy, remove from fuzzy list items which already presented in strict list, merge two lists. This way searching for "can" i'd see "canned" stuff in first pages, and "casing" somewhere deep down.

Vollch avatar Dec 04 '23 00:12 Vollch

makes sense. about designing API, what do you think of this approach?

  1. change return type of lcmatch to int. it represents 'score'.
  2. strict match returns INT_MAX.
  3. loose match returns 1 (or arbitrary score value if algorithm gets better)
  4. sort entries by score.

This approach would let us incrementally implement better fuzzy-searching algorithm.

scarf005 avatar Dec 04 '23 01:12 scarf005

That'll break alphabetical sorting. I'd change step 4 to "sort by score, and then alphabetically within same score groups"

Vollch avatar Dec 04 '23 01:12 Vollch

alright. that sounds good. will investigate further.

scarf005 avatar Dec 04 '23 01:12 scarf005

And, also, if it does scoring anyway, it might make sense to have multiple scores even for strict search, not just INT_MAX. E.g. "exact\word match", "entry\word starts with strings", "contains strings somewhere else". Chances that i'm searching for "american flag" typing "can" are pretty low.

Vollch avatar Dec 04 '23 01:12 Vollch

setting it to draft until mentioned changes are done.

scarf005 avatar Dec 09 '23 01:12 scarf005