lms icon indicating copy to clipboard operation
lms copied to clipboard

Change recommendation engine

Open Danoloan10 opened this issue 3 years ago • 5 comments

Previously, only tracks with the most cluster matches were returned. Now, the weight of each track is calculated as follows:

(N + 2^K)*x

Where x is random, N is the number of matching clusters and K the number of total clusters that the reference track/list spans. This makes it so that:

  • The weight of the count of matching clusters is logarithmic with respect to the total number of spanned clusters (if a track has many tags, more matches mean less)
  • Tracks with less weight can come on top thanks to the random variable, so that the behaviour is less deterministic

This somewhat fixes #217 (maybe helps with #54 as well?). This fix works only for the Clustering engine.

Danoloan10 avatar Jun 06 '22 19:06 Danoloan10

This PR is still a draft. I have noticed that the logic of selecting similar tracks is duplicated in Track::findSimilarTracks and TrackList::getSimilarTracks. Why is this? Is there any other place I'm missing?

The style of the improvement is lacking, it is kind of a PoC; maybe extracting this logic so that the duplication is avioded can lead to a cleaner solution.

Danoloan10 avatar Jun 06 '22 19:06 Danoloan10

Thinking of a way of verifying this recommendation now that it is non-deterministic

Danoloan10 avatar Jul 08 '22 22:07 Danoloan10

Hi! Have been thinking about another approach that would work whatever the recommandation engine used: if we want to select N similar tracks, we then ask for N*M similar tracks from the underlying selected recommandation engine. As each engine is supposed to order its matches with best matchs first. we could then randomly select the tracks with more weight for the first tracks (ex: geometric distribution with p=0.5 or even 0.4) Unit tests remain valid, we only make changes in the recommandation service (RecommendationService.cpp)

epoupon avatar Jul 30 '22 12:07 epoupon

Hello! Thanks for following up on the issue 🙂

The problem I have is that, given an unregularly labeled corpus (I.e.: some albums have more labels than others) the albums with the most labels will act as sinkholes. This will happen always if the result of the "getSimilar" operation returns the tracks with most matching labels deterministically. Even if you flatten the distribution by making the metric logarithmic, the sinkhole effect will just be diminished but not removed.

This is why my suggestion involves a nondeterministic engine. This way, the probability of a track being recommended will be weighted by the actual labels matched, so those that match most labels will still appear most probably on top, but those that match less labels can still appear thus adding variety to the radio.

There is a left away problem with this though, and its that in the current implementation of the radio all the queue is taken as a reference. This could mean that as songs are added, the recommendation gets fuzzier and fuzzier. A solution to this might be recommending based on just the N last songs of the queue. As for the tests, this engine still makes it impossible that tracks that match no labels are recommended. That behavior is still deterministic and could be tested.

I have not yet thought about the impact of this in the Features engine because I have not had the opportunity to test it yet (half my corpus isn't in AcousticBrainz, it's too much Spanish underground rock :P) 30 jul 2022, 14:00 por @.***:

Hi! Have been thinking about another approach that would work whatever the recommandation engine used: if we want to select N similar tracks, we then ask for N*M similar tracks from the recommandation engine. As each engine is supposed to orders its matches with best matchs first. we could then randomly select the tracks with more weight for the first tracks (ex: geometric distribution with p=0.5 or even 0.4) Unit tests remain valid, we only make changes in the recommandation service (RecommendationService.cpp)

— Reply to this email directly, > view it on GitHub https://github.com/epoupon/lms/pull/240#issuecomment-1200145053> , or > unsubscribe https://github.com/notifications/unsubscribe-auth/ACCDYGJIZ5XMZ2QA6MOIAKTVWUKPPANCNFSM5YAM2Z5Q> . You are receiving this because you authored the thread.> Message ID: > <epoupon/lms/pull/240/c1200145053> @> github> .> com>

Danoloan10 avatar Jul 30 '22 12:07 Danoloan10

Ah yes forgot about the annoying 'sinkhole' album effect. It would be really practical to find a solution that is not specific to a recommandation engine though. Maybe here we are confusing two problems:

  1. be able to select the closest track of a given bunch of track (no diversity involved, but possibly with additional filters)
  2. be able to bring some diversity to the radio mode (like proposing other albums and/or artists than what we currently have in last entries of the queue)

I have the feeling we can construct 2. on 1. Will think more about this (for example X% of chance to filter out the last N artists of the play queue, same for albums, things like that)

epoupon avatar Jul 30 '22 13:07 epoupon

For the record, found some interesting info here: https://www.slideshare.net/BenFields/finding-a-path-through-the-juke-box-the-playlist-tutorial

epoupon avatar Sep 11 '22 13:09 epoupon

Will add radio mode improvements in #54

epoupon avatar Sep 11 '22 20:09 epoupon