astroid icon indicating copy to clipboard operation
astroid copied to clipboard

Tag suggestions algorithm

Open make-github-pseudonymous-again opened this issue 6 years ago • 2 comments

Currently for a given input string the tag suggestions algorithm lists all tags that have the input string as an exact prefix. Would there be a way to make this algorithm different or customizable? I was thinking about a matching logic that is a bit more fuzzy: I have tons of tags with hierarchies and sometimes I forget who is the parent of a given tag (the prefix). I have then to try a bunch of prefixes before I find the correct one. In my case, prefix matching on each alphanumeric component of the tag (e.g. for a tag tr.cnf.socg18 match any prefix of tr, cnf or socg18) or substring matching or ranking all tags by Levenshtein distance with a threshold or some variant of those should be faster (to type) for prefixes that are used a lot: typing the entire prefix does not reduce the number of matches in case of prefix matching, it is only once you start typing the next keyword in the tag hierarchy that the number of matches drops. It seems mail.google.com implements a variant the first solution (prefix matching on each consecutive subsequence of alphanumeric components of the tag).

Another explanation is that my tag collection is disorganized, I am too lazy to reorganize it or memorize it, and plain prefix matching is fine.

Aurélien Ooms writes on November 14, 2018 10:50:

Currently for a given input string the tag suggestions algorithm lists all tags that have the input string as an exact prefix. Would there be a way to make this algorithm different or customizable? I was thinking about a matching logic that is a bit more fuzzy: I have tons of tags with hierarchies and sometimes I forget who is the parent of a given tag (the prefix). I have then to try a bunch of prefixes before I find the correct one. In my case, prefix matching on each alphanumeric component of the tag (e.g. for a tag tr.cnf.socg18 match any prefix of tr, cnf or socg18) or substring matching or ranking all tags by Levenshtein distance with a threshold or some variant of those should be faster (to type) for prefixes that are used a lot: typing the entire prefix does not reduce the number of matches in case of prefix matching, it is only once you start typing the next keyword in the tag hierarchy that the number of matches drops. It seems mail.google.com implements a variant the first solution (prefix matching on each consecutive subsequence of alphanumeric components of the tag).

Another explanation is that my tag collection is disorganized, I am too lazy to reorganize it or memorize it, and plain prefix matching is fine.

That makes sense, at least as a configurable option. See TagCompletion::match in command_bar.cc. Arbitrary logic could be implemented here, not sure where the sorting goes!

gauteh avatar Nov 15 '18 18:11 gauteh

I think i'd find very attractive a matcher with a configurable splitting character for hierarchies and then some fuzzy matching e.g. levenshtein on the subtags.

jorsn avatar Nov 04 '19 22:11 jorsn