albert Give preference to exact matches

Give preference to exact matches

Open vspinu opened this issue 6 years ago • 18 comments

I have my own application named e which starts emacs built from source, but I commonly use built in emacs 25 with quite a mouthful name "GNU emacs 25 (GUI)". If I type "e" I do get the latter as the default completion because I have used it more often so far. I believe this is unnatural. FWIW, krunner does give priority to the exact matches.

Operating system Kubuntu 18.04
Desktop environment KDE
Albert version From source master

Jul 26 '18 22:07 vspinu

Whats the point in giving items that you use less often prio? Effort = frequenzy / prio. Why would you want to increase the effort to get what you want? The whole point of this app is self optimization.

Jul 27 '18 08:07 ManuelSchneid3r

It depends on how you define effort. If I can get the long item with "em" and short exact match with "e" both of which are effortless, but if I always get the long term no matter what I input and I have to use arrows to navigate to the shorter item then that's "effort".

This behavior seems natural to me but I might be biased in this regard. All text completion frameworks that I use do give priority to the exact match.

Jul 27 '18 11:07 vspinu

While the "freq vs prio" debate is another one to be had (or polled or whatever) I have another idea. How about multiplying a words weight by eg 2 if the edit distance is 0.

I just had the case, that I had the entry "mpv Media Player" which I wanted to find. I don't know how often I already hit it but I guess less than 5 times. So "mp" returns it at top, while "mpv" returns some "maven" folders somewhere on my disk. While edit_distance("mpv" -> "mpv") = 0, edit_distance("mpv" -> "maven") = 3

While writing this I had another idea. How difficult would it be to make the weighing-algo programmable? If you'd open the possibility that users tinker with the algo themselves, probably there is some productive outcome.

Jul 30 '18 18:07 idkCpp

This is actually a very cool idea. What I was proposing is a particular case of the weight == +Inf.

I don't know the details of the current algorithm, but the weight of older hits should decay in time. If the user suddenly started doing Java and working with mvn, then a few recent hits of mvn should outweigh hundreds of 1 year old hits of Media player which are no longer relevant.

How difficult would it be to make the weighing-algo programmable?

It would probably mean moving the algorithm from c++ to python at the least. I would surely love to play with the algo myself. In any case having a few parameters of the algo exposed (multiplicative weight, logarithmic capping, time decay) would go a very long way.

Jul 30 '18 22:07 vspinu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Nov 05 '20 03:11 stale[bot]

Is this issue really to be closed? It seems like the discussion has not yet reached a satisfying result.

Dec 05 '20 09:12 idkCpp

@vspinu you are talking about the app extension right? atm a lot of extensions do not provide relevances, thats a problem which I want to adress in 0.18 with a global index which does the scoring implicitly. but it looks like your problem is the MRU score which takes precedence over the uninformed match score. relevance decay is already implemented. see https://github.com/albertlauncher/albert/blob/edd178d2d58002d69c0e5b25b927863251c8e28c/src/app/querymanager.cpp#L236 the weights decrease linear with age. maybe it should rather decrease quadratic.

Jan 05 '22 11:01 ManuelSchneid3r

@vspinu you are talking about the app extension right?

Not really. My request is very simple - "give priority to exact matches no matter what". If I typed "e" and there is such an application available, put it to the top regardless of what the relevance algorithm is saying.

Jan 05 '22 12:01 vspinu

You mean e is a perfect match for emacs, and it should be in front even if you ran ewhateverappname often in the past?

Atm recently used information takes precedence over the match score of an item you never ran.

Its basically like: if it matches you get your favorites then sort by count of matched characters divided by the total matchable characters an item has. This seems perfectly natural to me. However I am open to suggestions.

How would you handle this case: you want to run Firefox with a "fi" query. However there is also a file with the name "file". Without MRU info file would always appear in front which is annoying in the end.

Another option would be to mix the score but this is definitvely harder to get it right for everybody

Jan 06 '22 00:01 ManuelSchneid3r

you want to run Firefox with a "fi" query. However there is also a file with the name "file". Without MRU info file would always appear in front which is annoying in the end.

I am confused. Didn't you say that "fi" would have a score 2/2 and "file" 2/4, thus "fi" has precedence?

Mixing score and MRU stastistic seems natural to me. There could be a configuration option, a weight between 0 and 1. If 0, only MRU is used, if 1 only the score, everything in between is a weighted average of the two. It depends on how MRU is measured if it's a metric between 0 and 1 like the match score is then the average would make sense.

Jan 06 '22 08:01 vspinu

fi/firefox is 2/7. I agree that mixing is indeed better, but in the end Albert is intended to pimp your personal, recurrent workflows.

The linear weight is as trivial as it can get and I'd prefer to implement "real learning" instead of building around a thought out static scoring. For heavy users the latter would have hardly any advantage.

I thought a lot about getting sophisticated machine learning into it. There should be a real mapping of queries to items (atm only items itself get the score, without taking the query into account). Such that if you used Firefox (Firefox having an additional lodash index string "FF") and ffmpeg equally often in the past, you ran Firefox with fi and ffmpeg with ff, FF should result in ffmpeg instead of Firefox although MRU scores equal and the matchescore is perfect since ff/ff=1.

There is plenty of room for improvements besides the static scoring which is only relevant if you have never used the item.

I dare to say that nobody would use the ruler to find out their personal preferred mixing score since one cannot check the impact in a comprehensible way.

Jan 06 '22 11:01 ManuelSchneid3r

As a data scientist I very much like the learning approach. No need to get fancy about it IMO. I bet one can do 90% of a full blown ML with a simple linear regression.

The weighting scheme is a one parameter model. Estimate the only coefficient and bam, you have your ML ;) Advantage that people can tinker with the weight themselves and it's crysta clear what it does.

On the other hand I am not sure it's worth the effort. Since I opened an issue I was never botherd by the problem. The MRU does the job really well. It's just that exact match should be a special IMO. At least in all completion system that I used in the past this was the case.

Jan 06 '22 15:01 vspinu

It just occurred to me that maybe the OP wasn't very clear after all. By "exact match" I mean the full and exact match of the entire program name. In the OP case I really had a program named "e". So the score would be score(e/e) == 1. In your example from above I assumed that there is a program (or a shortcut) fi thus, an exact match, and score == 1 as well. What I claim is that a completion system should pop perfect matches on the top, no matter what the other sorting algorithms do.

Jan 07 '22 14:01 vspinu

Indeed i got this wrong (although it was pretty clear) but still the slightly changed firefox example would annoy me (How would you handle this case: you want to run Firefox with a "fi" query. However there is also a file with the name "fi". Without MRU info file would always appear in front which is annoying.)

Jan 07 '22 20:01 ManuelSchneid3r

However there is also a file with the name "fi". Without MRU info file would always appear in front which is annoying

If MRU is used as a disambiguation of the exact match then it's not a problem.

Jan 08 '22 12:01 vspinu

Simple order and slider like this? https://www.math3d.org/BjaLgHI1p

Jan 12 '22 13:01 ManuelSchneid3r

MRU and Match scores have to be normalized to [0,1] for each query otherwise the score is biased right? Like https://www.math3d.org/TeKRjPknx

Jan 12 '22 13:01 ManuelSchneid3r

Yes, normalization is the key, the scaling of the MRU should be 0, never used in past X months, to 1 = most frequently used in past X months.

Are you thinking going into weighting direction? To my mind implementing the "lexicographic" rule would fix the current issued and it's super simple. Just to be clear this is the algo that I meant:

1. Sort as it is currently implemented
2. Bring all the exact matches (score == 1) to the top
3. Disambiguate the matches at 2. with MRU

Of course it all hinges on how happy you are with 1. If you want to improve on 1 then weighiting scheme and auto-learning the parameter (T) might be worth considering.

Jan 12 '22 13:01 vspinu

Maybe like this $x\cdot T\cdot \left(1-y\right)+y$ ? Matchscore never ignored, additional usage score with user dependent weight. The former approach makes it possible to omit matchscores entirely.

Further there is the problem with the normalization. The usage score atm is somehow meaningless. I mean what is a usage score of 1 at all? Used every day? used weekly? frequency? Also I hate alfred for forgetting that fast. maybe an initial boost of "used at all" makes sense. (which somehow goes in the direction of the currently used usage-before-match-score strategy.

Oct 25 '22 17:10 ManuelSchneid3r

Matchscore never ignored,

So $y$ is a match score in [0, 1] an $x$ is the usage score. Not bad. Then within the same partial match group candidates are sorted by usage.

I would prefer to have a symmetric form $x(1-w)(1-y)+wy$ where $w\in[0,1]$ with the default of $w=\frac{1}{2}$.

Ideally $y$ should also be in $[0,1]$ for this to truly make sense. Hence the normalization problem that you talk about.

I mean what is a usage score of 1 at all? Used every day? used weekly? frequency?

I don't know how usage decays over time in albert but regardless of the algorithm one can normalize to the maximum usage so far. That is, 1 is scored by the candidate with the maximum usage statistic in the data base. Algorithmicly it amounts to tracking the maximal usage score over time and divide actual usages by this cap during the scoring. Do you see any drawbacks of this approach?

Oct 27 '22 15:10 vspinu

I would prefer to have a symmetric form $x(1-w)(1-y)+wy$ where $w\in[0,1]$ with the default of $w=\frac{1}{2}$.

https://www.math3d.org/jIRidMWGm For w=0 full match, full usage items have a score of 0;

Do you see any drawbacks of this approach?

No, normalization is needed and makes sense… in theory. But practically users operate on the lower end of match scores. As albert learns my usage, i learn the minimal amount of chars needed to get what I want. E.g. opening Chromium unsing 'c'+enter. I'll have to think about this a bit.

Nov 23 '22 22:11 ManuelSchneid3r

0.18 makes this user customizable. Still I dont think its perfect. let me know what you think

Dec 31 '22 13:12 ManuelSchneid3r

Thanks! I will play with it.

Jan 03 '23 13:01 vspinu

Any thoughs? I personally dont like it really. Permanently random items i dont use are preferred over my used ones.

Jan 18 '23 13:01 ManuelSchneid3r

I had loads of issues with installation this time. Finally got some time to figure them out. (BTW libqt6sql6-sqlite is not part of docker deps which I usually use to get up to speed).

Trying it now and don't see much of a difference. But I am using only files and apps in my workflow. I set the slider in the middle to get a better feel. So far everything works as expected.

But I see now that this feature is likely to be confusing though. It's hard to reason about it even if one knows the formula, let alone understanding what it does form the short description in the popup. So maybe it's not worth having it after all. Just bring exact matches to the top and leaving everything as it was would certainly solve my OP. This is how my emacs completion works and I am pretty happy with it.

Jan 19 '23 15:01 vspinu

then a few recent hits of mvn should outweigh hundreds of 1 year old hits of Media player which are no longer relevant.

That's the decay part. Decay: MRU. No decay at all MFU. Their influence decays the "older" they are. Weight is the algo we discussed. You can entirely ignore your past queries, use its information exclusively or anything between.

Jan 19 '23 17:01 ManuelSchneid3r

I tried it out and the original problem still persists. Exact matches are not reffered.

Plus there is a weird preference for prefix match. I can no longer select google-chrome with chro. It does not even show in the list. Same problem for files, only prefix seems to be matching.

At the same time e would match in the middle of Telegramin the above screenshot. Very confusing,

Jan 19 '23 23:01 vspinu

Plus there is a weird preference for prefix match.

there have never been substring matches. they make no sense anyway. probably "-" is not in your separators.

At the same time e would match in the middle of Telegramin the above screenshot. Very confusing,

indeed. telegram does not match on my system what happens if you disable all the optons in the app settings?

Jan 20 '23 00:01 ManuelSchneid3r

there have never been substring matches. they make no sense anyway. probably "-" is not in your separators.

I see, finally I understood the meaning of "separators" :)

I indeed messed the separators accidentally. My current ones are [\s\\\/\-\[\](){}#!?<>"'=+*.:,;_]+. google-chrome appears again.

Would be great to have a "Reset to Default" button to revert all changes to the original.

if you disable all the optons in the app settings?

I disabled everything in the app settings and in the application plugging settings. Same behavior. I think it has to do with the Exec name of the telegram. It starts with env. And I can confirm, env does match Telegram as well.

[Desktop Entry]
X-SnapInstanceName=telegram-desktop
Name=Telegram Desktop
Comment=Official desktop version of Telegram messaging app
Exec=env BAMF_DESKTOP_FILE_HINT=/var/lib/snapd/desktop/applications/telegram-desktop_telegram-desktop.desktop /snap/bin/telegram-desktop -- %u
Icon=/snap/telegram-desktop/4486/meta/gui/icon.png
Terminal=false
StartupWMClass=TelegramDesktop
Type=Application
Categories=Chat;Network;InstantMessaging;Qt;
MimeType=x-scheme-handler/tg;
Keywords=tg;chat;im;messaging;messenger;sms;tdesktop;
Actions=Quit;

[Desktop Action Quit]
Exec=env BAMF_DESKTOP_FILE_HINT=/var/lib/snapd/desktop/applications/telegram-desktop_telegram-desktop.desktop /snap/bin/telegram-desktop -quit
Name=Quit Telegram
Icon=/snap/telegram-desktop/4486/meta/gui/icon.png

One more thing, the settings wheel is not shown in the right corner of the display. It appears only on mouse hovering. See my screenshot in previous post. Is this intentional? It's fine with me as I already know it's there, but new users might not be able to discover it. This is on ubuntu 22.04.

Jan 20 '23 11:01 vspinu

I think it has to do with the Exec name of the telegram. It starts with env. And I can confirm, env does match Telegram as well.

ty, fix and new option incoming.

Is this intentional

yes. it serves as busy indicator from 0.18 on. well theres still the preferences/settings item and tray icon

Jan 20 '23 11:01 ManuelSchneid3r

albert albert copied to clipboard

Give preference to exact matches

albert
albert copied to clipboard