m.css icon indicating copy to clipboard operation
m.css copied to clipboard

Fuzzy searching?

Open thomthom opened this issue 5 years ago • 3 comments

I was wondering if there was any interesting in allowing fuzzy search?

Some of the background for this is that I mantain a C API where there's a lot of prefixes to keep things unique (product initials + "class" name + function name).

Right now it appear that the search works only on the start of each object being searched.

For example: image

Compare to if I just type "editing": image

I'm very fond of how Sublime Text (and many other editors like VSCode etc) let you search. In the example above, if I typed "tee" it would value the upper case letters in the symbols so that "TextEditingEvents" ranked high.

I have in the past experimented with fts_fuzzy_match for such functionality for some projects I'm working on. (https://github.com/forrestthewoods/lib_fts/) It's been working rather well.

More details on the logic here: https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/

Any interest in this?

thomthom avatar Jun 08 '20 18:06 thomthom

See here for a WIP implementation of a similar thing by @sizmailov: https://github.com/mosra/m.css/pull/149

I wanted to implement something like this (in particular the {tee, texee, texede} -> TextEditingEvent variant) when doing the original search implementation but I put it aside because it wasn't strictly needed for the MVP. The search is implemented as a trie, so I'm not sure if the libs you linked would be of any use here, but I think I could still dig up the original implementation somewhere and finish it -- if my time allows, I guess you get the idea based on the frequency I reply on the issues here :sweat_smile:

I mantain a C API where there's a lot of prefixes to keep things unique (product initials + "class" name + function name)

With the change I did for #127, I finally have my hands free to add some config option allowing this (a similar case is wanting to search without get_ / set_ prefixes).

mosra avatar Jun 08 '20 18:06 mosra

See here for a WIP implementation of a similar thing by @sizmailov: #149

Oh, that's interesting. I'm subscribing to that thread. Even though that's not complete fuzzy search, extracting words by camelCase or under_score separation still helps.

The search is implemented as a trie, so I'm not sure if the libs you linked would be of any use here

I did give it a quick try a few weeks ago. But I got stomped trying to understand the structure of the search data. The article you mentioned explains a lot!

Hav­ing good de­bug vi­su­al­iza­tion is key to un­der­stand­ing the da­ta.

Indeed! Does m.css come with the tools to debug the search data?

if my time allows, I guess you get the idea based on the frequency I reply on the issues here

No worries, I fully understand the challenges of maintaining a project like this. Not expecting you to do anything. I started this thread because I had tried to provide a PR myself for such functionality, but unfortunatly I got rather lost in the data structure and how data was obtained.

With the change I did for #127, I finally have my hands free to add some config option allowing this (a similar case is wanting to search without get_ / set_ prefixes).

Yes, we have a number of Get/Set prefixed functions. (well, prefixed before the function name, but after the product and class prefixes. One thing I wanted to look into was sorting that still kept Get/Set functions next to each other without having to resort to documentation markup. I might revisit that later on.

thomthom avatar Jun 09 '20 13:06 thomthom

Does m.css come with the tools to debug the search data?

Yes, the documentation/test_doxygen/test_search.py can be used to visualize the search data the same way as was done in the article, including colors. I hope it still works properly, didn't touch the visualization code for over two years :)

One thing I wanted to look into was sorting that still kept Get/Set functions next to each other

Unless I misremember how the lookup and result population behaves, that'll happen automagically when the get/set prefixes get stripped. Or .. you mean in the doxygen-generated output? Disable the SORT_MEMBER_DOCS option, that one is enabled by default and absolutely useless in my opinion -- with it disabled, you'll get the functions ordered the same way as in the file (and there I assume you keep getter and setter pairs together).

mosra avatar Jun 09 '20 14:06 mosra