akin
akin copied to clipboard
Update recommendations on activity?
So it takes forever to generate everything with a decent dataset. Is there no way to update after specific activity without doing a total regeneration?
@Charuru no, not at the moment. That's primarily blocked by the feature to provide a weighted decay of the activity over time. If that were to go away, or to move into an entirely separate process, then the similarity to other users could be updated on-the-fly with far fewer recalculations (in theory, but not currently part of this code base).
Originally, there wasn't a need for live updating of recommendations - instead, it was a low overhead add-on to existing workflows that could be refreshed periodically as a batch job.
What marks do you need to hit for activity throughput and the refresh rate for recommendations? Also, what sort of benchmarks are you seeing for time-to-dataset size?
Thanks for responding, is there no chance of continuing development here? I need some other features like taking into account dislikes.
I haven't benched it closely but on my 8700k it takes roughly 40 minutes to do 120k activity items and 10k users. I assume it's single thread. That sounds about right to you? I'm okay with running this periodically but yeah it would be ideal if the decay could be split from the overall calculations so that similarity could be fast and decay can wait.
@Charuru there's certainly a chance of continuing development here, whether it's through feature requests or PRs.
Feature Request: Account Dislike
You can likely handle this one of two ways now (please correct me if I'm misunderstanding your use case):
- Provide a negative weight for that activity.
- You can override the default
actionWeights
inactivity.service.js
via thesetActionWeight(action, weight)
method. That will accept positive and negative values, allowing you to customize the calculations to your needs.
- You can override the default
- Continue collecting user activity for similarity purposes, but don't recommend an item to that user.
- The
recommendation.service.js
has amarkRecommendationDNR(userId, itemId, itemMetadata)
method for this "do not recommend" need. Looks like it's not documented in the README, though!
- The
Performance Improvements
Regarding your benchmark, that seems reasonable, if not a bit long. I've seen similarly sized data sets run in under 10 minutes, so we're in the same ballpark (tens of minutes, that is). You're correct that this library is currently single threaded, which is the primary bottleneck for performing the full calculation.
I'm interested to see what it'll take to get to this incremental update approach that would let it run immediately with that single thread. Performing those calculations on the fly would limit the throughput that the library can currently maintain, but that may be an acceptable trade-off to the full recalculation hit.
Thanks, glad to hear that you're still maintaining the project.
Negative weight sounds like it'll work for me, I'll be making use of DNR as well.
My issue re: performance is that I'd prefer to use slower servers and bigger datasets. I think incremental update would be a major boon, it would still be a batch job on a separate process though.
Got a second dataset, 97k userActivity, 1.4M userSimilarities, 5.2k userRecommendations, total time: 33:41.049 (m:ss.mmm)