lightfm icon indicating copy to clipboard operation
lightfm copied to clipboard

Running LightFM on new data points only

Open kshitijyad opened this issue 4 years ago • 4 comments

Hi, I have a question which I need to ask the community.

I have built a recommendation engine using LightFM and I am only taking customer's last 20 interactions to keep up with the pace of customer changing preferences.

My model using hyperparameter training takes 10 hours to complete. I have to run the model every day as new data is added each day and hyperparameter tuning takes 10 hours, so each day the algorithm takes 10 hours to complete.

I was wondering if there is a way with which I can run model only for those users who have made interactions each day. My total user database is about 100k, and let's say only 10k login each day on an average, so instead of running the recommender for the whole 100k each day, can I just run it on 10k for which the data changes? How can I reduce the 10 hours and make it more scalable? Kindly advice. Thanks

kshitijyad avatar Jan 06 '20 21:01 kshitijyad

Hi, Im not a LightFM contributor but this seems to be a general question about recommender systems.

In your position I would perform hyperparameter tuning only once, ever. Or perhaps monthly/yearly? Include the number of previous interactions as a hyperparameter - 20 might not be optimal. This gives you the "best" hyperparameters for your particular problem domain, and the assumption is that these generalize across time. If you are changing the hyperparameters each day you are almost surely overfitting.

Without HPO, training time alone would be within ~1hour.

ljmartin avatar Jan 07 '20 03:01 ljmartin

For training new interactions for existing users only, you can use fit_partial method to partially train your model

AnshuSharma1 avatar May 25 '20 12:05 AnshuSharma1

@AnshuSharma1 does this mean that if I have new users and new interactions, I cannot use fit_partial? Instead I would need to retrain my whole model? I've been searching for some clarification on this so any insight would be helpful!

uncvrd avatar Sep 14 '20 04:09 uncvrd

@uncvrd You can use fit_partial only for users and items that are already trained in the model. So let's say you have trained the model for 100 users and 100 items. You can add new/change interactions for those users and items only. Now for the case of entirely new users, you can have 2 approaches:

Dummy train the model for extra users and items without any interaction data and add them as you go. That is you train for let's say 10 more users which are non existent at time of training. You partially add interactions to them as they come. You'll need to check your data and expected results on this. LightFM generates user/item feature embeddings. So if you have features of a new user (ex gender, etc), you can use the feature vector for a dummy user id(existing in model) in model.predict 'user_features' and you'll get estimated results. This can be helpful till you can train the interactions with model.fit. Make sure to pre train your model with features of all users, otherwise it'll assume an identity matrix. ref - https://making.lyst.com/lightfm/docs/_modules/lightfm/lightfm.html#LightFM.get_user_representations

AnshuSharma1 avatar Sep 15 '20 12:09 AnshuSharma1