astroid
astroid copied to clipboard
Tag suggestions algorithm
Currently for a given input string the tag suggestions algorithm lists all tags that have the input string as an exact prefix. Would there be a way to make this algorithm different or customizable? I was thinking about a matching logic that is a bit more fuzzy: I have tons of tags with hierarchies and sometimes I forget who is the parent of a given tag (the prefix). I have then to try a bunch of prefixes before I find the correct one. In my case, prefix matching on each alphanumeric component of the tag (e.g. for a tag tr.cnf.socg18 match any prefix of tr, cnf or socg18) or substring matching or ranking all tags by Levenshtein distance with a threshold or some variant of those should be faster (to type) for prefixes that are used a lot: typing the entire prefix does not reduce the number of matches in case of prefix matching, it is only once you start typing the next keyword in the tag hierarchy that the number of matches drops. It seems mail.google.com implements a variant the first solution (prefix matching on each consecutive subsequence of alphanumeric components of the tag).
Another explanation is that my tag collection is disorganized, I am too lazy to reorganize it or memorize it, and plain prefix matching is fine.
Aurélien Ooms writes on November 14, 2018 10:50:
Currently for a given input string the tag suggestions algorithm lists all tags that have the input string as an exact prefix. Would there be a way to make this algorithm different or customizable? I was thinking about a matching logic that is a bit more fuzzy: I have tons of tags with hierarchies and sometimes I forget who is the parent of a given tag (the prefix). I have then to try a bunch of prefixes before I find the correct one. In my case, prefix matching on each alphanumeric component of the tag (e.g. for a tag
tr.cnf.socg18match any prefix oftr,cnforsocg18) or substring matching or ranking all tags by Levenshtein distance with a threshold or some variant of those should be faster (to type) for prefixes that are used a lot: typing the entire prefix does not reduce the number of matches in case of prefix matching, it is only once you start typing the next keyword in the tag hierarchy that the number of matches drops. It seems mail.google.com implements a variant the first solution (prefix matching on each consecutive subsequence of alphanumeric components of the tag).Another explanation is that my tag collection is disorganized, I am too lazy to reorganize it or memorize it, and plain prefix matching is fine.
That makes sense, at least as a configurable option. See
TagCompletion::match in command_bar.cc. Arbitrary logic could be
implemented here, not sure where the sorting goes!
I think i'd find very attractive a matcher with a configurable splitting character for hierarchies and then some fuzzy matching e.g. levenshtein on the subtags.