Cataclysm-BN
Cataclysm-BN copied to clipboard
feat(UI): korean and english fuzzy search
Purpose of change
improve ergonomics of filtering items.
Describe the solution
lcmatch
now uses std::wregex
to fuzzy-search items.
e.g mgzn
-> m.*?g.*?z.*?n
this isn't the best fuzzy search algorithm, however it's not too complex and supports complex fuzzy searching in korean (e.g match '감' with either 'ㄱ' or '가')
Describe alternatives you've considered
using std::locale
on language.cpp
diff
diff --git a/src/language.cpp b/src/language.cpp
index da7c57a00568..657252dd9541 100644
--- a/src/language.cpp
+++ b/src/language.cpp
@@ -2,6 +2,7 @@
#include <algorithm>
#include <fstream>
+#include <locale>
#if defined(_WIN32)
# if 1 // Prevent IWYU reordering platform_win.h below mmsystem.h
@@ -158,23 +159,9 @@ static std::string getSystemUILang()
// Linux / Android
static std::string getSystemUILang()
{
- std::string ret;
+ const auto lang = std::setlocale( LC_ALL, std::locale( "" ).name().c_str() );
- const char *language = getenv( "LANGUAGE" );
- if( language && language[0] != '\0' ) {
- ret = language;
- } else {
- const char *loc = setlocale( LC_MESSAGES, nullptr );
- if( loc != nullptr ) {
- ret = loc;
- }
- }
-
- if( ret == "C" || string_starts_with( ret, "C." ) ) {
- ret = "en";
- }
-
- return to_valid_language( ret );
+ return to_valid_language( lang );
}
#endif // _WIN32 / !MACOSX
filtering entries by regex relevance in uilist::filterlist
this would be a big change, thus deserves its own PR.
Testing
added string_fuzzy_search_test.cpp
.
fuzzy search in english
https://github.com/cataclysmbnteam/Cataclysm-BN/assets/54838975/2577c8c6-efd9-42e4-963e-2d5dea38176a
fuzzy search in korean
https://github.com/cataclysmbnteam/Cataclysm-BN/assets/54838975/aec97fa9-9ef1-4a21-8c0a-9c47cce87bae
Additional context
reference: https://taegon.kim/archives/9919
Or, thinking about it, maybe it could perform fuzzy search when string starts with some special character. As adding toggles to all places expecting text must be pretty difficult. e.g. "can" - regular, "~can" - fuzzy.
or maybe we could improve the logic to make lcmatch
return 'certainty score', and sort the result accordingly.
Right, that would also work. In fact, you don't really need to bother with scoring - instead you can rely on simple assumption that strict matches are most accurate. So, make two list - strict and fuzzy, remove from fuzzy list items which already presented in strict list, merge two lists. This way searching for "can" i'd see "canned" stuff in first pages, and "casing" somewhere deep down.
makes sense. about designing API, what do you think of this approach?
- change return type of
lcmatch
toint
. it represents 'score'. - strict match returns
INT_MAX
. - loose match returns
1
(or arbitrary score value if algorithm gets better) - sort entries by score.
This approach would let us incrementally implement better fuzzy-searching algorithm.
That'll break alphabetical sorting. I'd change step 4 to "sort by score, and then alphabetically within same score groups"
alright. that sounds good. will investigate further.
And, also, if it does scoring anyway, it might make sense to have multiple scores even for strict search, not just INT_MAX
.
E.g. "exact\word match", "entry\word starts with strings", "contains strings somewhere else".
Chances that i'm searching for "american flag" typing "can" are pretty low.
setting it to draft until mentioned changes are done.