implicit icon indicating copy to clipboard operation
implicit copied to clipboard

Value for regularization parameter, and other hyperparameters.

Open artdgn opened this issue 6 years ago • 3 comments

Hi, first thanks for the awesome package!

  1. Regularization: After some experiments with the ALS recommender on my data it seems that I'm not using the regularization hyperparameter in the right way. I've ran a basic random search on various combinations of parameters and I'm checking the effect of each hyperparameter by measuring mutual information between that my accuracy metric. It seems that the values for factors, iterations, and BM25 weighting parameters (B especially) are important for my data, but for regularization I get precisely 0, which implies that that parameters doesn't affect the accuracy at all for my runs. The range I've been probing is 0.0001 up to 10.0 (with tens to hundreds of factors, and up to several tens of iterations). Is there any intuition that you can help with to get into an effective range for that parameter? For example, should that value depend on the number of users/items? (since it's only used for I term in YtY calculation)

  2. Number of iterations: at least on my data, for several different size subsets there seems to be a sweetspot for 3 iterations, even though I would expect that the number of required iterations should change with the model's capacity and amount of data. Is there an intuition for that (I see the default in the .pyx file is 3 as well), or is that just a random coincidence?

artdgn avatar May 09 '18 00:05 artdgn

Thanks!

  1. For the regularization parameter, the range you've picked seems reasonable to me. The parameter just controls the L2 regularization scheme to help prevent overfitting. I'm not sure why it doesn't have much effect in your experiments=( I would just leave set to a small value in your case. The best regularization values do change with the dataset, though I've been thinking about changing to scaling with the number of non-zero's like spark mllib does to make it easier to xval on a subset of the data.

  2. The default is to run for 15 iterations - by which time it should have converged in most cases. However, using the CG optimizer it does a couple of conjugate gradient steps per iteration (cg_steps parameter), and I've defaulted that to 3 steps per iteration. I wrote a post that explains why 3 CG steps/ iteration here: https://www.benfrederickson.com/fast-implicit-matrix-factorization/

I think it's interesting that you're finding good results at 3 iterations. I'm guessing the model isn't fully converged then, and the reason you're seeing good results is that you're using an early exit from the model fitting as a form of regularization - potentially making the l2 regularizer redundant.

benfred avatar May 09 '18 01:05 benfred

Great, thanks for the quick response. I'll experiment more with those parameters and report back. If cg_steps proves to be an important parameter it might be worth making it passable to __init__().

artdgn avatar May 09 '18 02:05 artdgn

I've done some more experiments.

  1. regularisation: it seems in my case regularisation either has no effect or a detrimental effect. So perhaps number of iterations is indeed a better regularisation mechanism in this case.
  2. cg_steps: is indeed an important parameter, and different combinations of it with number of iterations give different results (for me it turned out that 2 iterations with cg_steps=4 worked better). So it might be a good idea to add it as an argument to the __init__() function, with 3 as default recommended value. Would you accept is PR with that change?

On another subject, I'm trying to make it easy to work several different libraries and algorithms for recommendations, and I've made a repository that integrates a couple of packages including the ALS model (with BM25 weighting) from your package. It's still a work in progress, but perhaps some of the users of those packages can already benefit from it. Basically I try to make them all obey a common API with dataframes as inputs and outputs and common functionality for evaluation, tuning, getting recommendations and similarities out, so that it's easy to both experiment and deploy quickly. Here's the repo: https://github.com/DomainGroupOSS/ml-recsys-tools, would be really happy for any feedback and any other ideas / thoughts on this subject (I've only recently added ALS and implicit, that's why only some of the implicit algos are covered).

artdgn avatar May 11 '18 01:05 artdgn