akin icon indicating copy to clipboard operation
akin copied to clipboard

Update recommendations on activity?

Open Charuru opened this issue 4 years ago • 5 comments

So it takes forever to generate everything with a decent dataset. Is there no way to update after specific activity without doing a total regeneration?

Charuru avatar Aug 14 '20 00:08 Charuru

@Charuru no, not at the moment. That's primarily blocked by the feature to provide a weighted decay of the activity over time. If that were to go away, or to move into an entirely separate process, then the similarity to other users could be updated on-the-fly with far fewer recalculations (in theory, but not currently part of this code base).

Originally, there wasn't a need for live updating of recommendations - instead, it was a low overhead add-on to existing workflows that could be refreshed periodically as a batch job.

What marks do you need to hit for activity throughput and the refresh rate for recommendations? Also, what sort of benchmarks are you seeing for time-to-dataset size?

brianghig avatar Aug 17 '20 11:08 brianghig

Thanks for responding, is there no chance of continuing development here? I need some other features like taking into account dislikes.

I haven't benched it closely but on my 8700k it takes roughly 40 minutes to do 120k activity items and 10k users. I assume it's single thread. That sounds about right to you? I'm okay with running this periodically but yeah it would be ideal if the decay could be split from the overall calculations so that similarity could be fast and decay can wait.

Charuru avatar Aug 17 '20 15:08 Charuru

@Charuru there's certainly a chance of continuing development here, whether it's through feature requests or PRs.

Feature Request: Account Dislike

You can likely handle this one of two ways now (please correct me if I'm misunderstanding your use case):

  1. Provide a negative weight for that activity.
    1. You can override the default actionWeights in activity.service.js via the setActionWeight(action, weight) method. That will accept positive and negative values, allowing you to customize the calculations to your needs.
  2. Continue collecting user activity for similarity purposes, but don't recommend an item to that user.
    1. The recommendation.service.js has a markRecommendationDNR(userId, itemId, itemMetadata) method for this "do not recommend" need. Looks like it's not documented in the README, though!

Performance Improvements

Regarding your benchmark, that seems reasonable, if not a bit long. I've seen similarly sized data sets run in under 10 minutes, so we're in the same ballpark (tens of minutes, that is). You're correct that this library is currently single threaded, which is the primary bottleneck for performing the full calculation.

I'm interested to see what it'll take to get to this incremental update approach that would let it run immediately with that single thread. Performing those calculations on the fly would limit the throughput that the library can currently maintain, but that may be an acceptable trade-off to the full recalculation hit.

brianghig avatar Aug 17 '20 18:08 brianghig

Thanks, glad to hear that you're still maintaining the project.

Negative weight sounds like it'll work for me, I'll be making use of DNR as well.

My issue re: performance is that I'd prefer to use slower servers and bigger datasets. I think incremental update would be a major boon, it would still be a batch job on a separate process though.

Charuru avatar Aug 17 '20 19:08 Charuru

Got a second dataset, 97k userActivity, 1.4M userSimilarities, 5.2k userRecommendations, total time: 33:41.049 (m:ss.mmm)

Charuru avatar Aug 21 '20 03:08 Charuru