cmfrec icon indicating copy to clipboard operation
cmfrec copied to clipboard

Logging loss/metrics per iteration

Open Northo opened this issue 1 year ago • 4 comments

I have tried as best I can to figure this out, I am sorry if the feature already exits and I simply did not find it.

Feature request I want to log my loss and other metrics per iteration. Ideally, I would want to be able to supply a callback to be evaluated at each iteration. How to interface this with the C backend, I am not certain about.

Do you think this is possible to implement?

Northo avatar Apr 11 '23 10:04 Northo

Logging an error metric might or might not be possible depending on the combination of input parameters, as in some cases (particularly when using the NA_as_zero* options) calculating the error would involve a loop over the full users-by-items matrix.

A callback could definitely be added in the future though, but the arguments might not be easy to use as the model might do extra processing to the inputs like reindexing rows/columns or padding with missing values.

david-cortes avatar Apr 11 '23 15:04 david-cortes

Thank you for the very swift reply! I then have two follow-up questions:

  1. Is there currently an option to log as function of iteration in any capacity, like the minimization objective, for example?
  2. Is this something there is interest to work on/follow up on? Any callback would be very nice, even if it is somewhat cumbersome to use.

For our application, investigating and understanding our convergence is quite important.

Northo avatar Apr 12 '23 14:04 Northo

  1. Only when using the LBFGS solver and setting verbose=TRUE. However it doesn't support the same options as ALS, and it's not as scalable to large datasets.
  2. Not right now. I unfortunately don't think I'll work in this feature for the remainder of this year.

However, if you are using the ALS method, you shouldn't worry about overfitting from performing more iterations. It might even be faster to overshoot by a few iterations beyond convergence than to calculate a metric after each iteration, assuming that the amount of data is large.

david-cortes avatar Apr 12 '23 16:04 david-cortes

Ok, thanks for the detailed feedback!

Northo avatar Apr 19 '23 07:04 Northo