implicit icon indicating copy to clipboard operation
implicit copied to clipboard

Negative Loss on gpu ALS model

Open sikhad opened this issue 5 years ago • 1 comments

I'm getting a negative loss value when running ALS using GPU (loss = -.0346) regardless of varying all parameters. When running the same data/parameters on CPU, I'm getting a positive loss. I'm confused why loss could be negative.

It's a ~6500 x 1m csr matrix.

params = {'factors':64, 
          'use_gpu':True, 
          'use_native':True, 
          'use_cg':True, 
          'regularization':0, 
          'num_threads':0,
          'iterations':5,
          'calculate_training_loss':True}

# initialize a model
model = implicit.als.AlternatingLeastSquares(**params)

# train the model on a sparse matrix of item/user/confidence weights
model.fit(csr, show_progress=True)

sikhad avatar Jul 14 '20 18:07 sikhad

Its looking like the GPU loss calculation might be buggy (See also #441 )

benfred avatar Jan 13 '22 16:01 benfred

There was a bug with the calculate_training_loss parameter - when the number_of_items * number_of_users was bigger than 2**31. This will be fixed by #663 in the next release.

thanks for reporting - sorry about the lengthy delay in getting this resolved.

benfred avatar Jun 06 '23 21:06 benfred

Thanks. Will you release a new pip module version?

gallir avatar Jun 06 '23 21:06 gallir

@gallir - I'm working on getting a new version together - I also want to get changes like https://github.com/benfred/implicit/pull/661 and https://github.com/benfred/implicit/pull/656 pushed out to people too.

I'd also like to fix the conda packaging errors with this version - once I have a handle on that I'll push out a new release.

benfred avatar Jun 06 '23 22:06 benfred

@gallir - fix is in v0.7.0

benfred avatar Jun 13 '23 05:06 benfred

v0.7.0

Thank you very much. I had modified your build yml to use your latest version, it worked better than before https://github.com/gallir/implicit

gallir avatar Jun 13 '23 10:06 gallir