tensorrec icon indicating copy to clipboard operation
tensorrec copied to clipboard

Calculate Normalised Discounted Cumulative Gain Error

Open gallmerci opened this issue 6 years ago • 6 comments

I get the following error when running fit_and_eval from tensorrec.eval:

File "/usr/local/lib/python3.5/dist-packages/tensorrec/eval.py", line 81, in _dcg numer = (2**np.multiply(relevance.data, k_mask)) - 1 ValueError: operands could not be broadcast together with shapes (5347038,) (52804,)

https://github.com/jfkirk/tensorrec/blob/65fefe4437c8974b39cc9ab56b9769ed9eb70ffa/tensorrec/eval.py#L81

Looking at the source code and the definition of the discounted cumulative gain, I think that the calculation of k_mask is not correct for the application here, because it is calculated by using the data array of a different sparse matrix

https://github.com/jfkirk/tensorrec/blob/65fefe4437c8974b39cc9ab56b9769ed9eb70ffa/tensorrec/eval.py#L68

Instead it should, in my opinion, use the entire ror matrix, something like

k_mask = ror < k+1

However, < operator is quite inefficient for sparse matrices but I hope it is clear what I mean :)

gallmerci avatar Nov 28 '18 23:11 gallmerci

Hey @gallmerci ! Thanks for reporting.

What is the shape of your user_features, item_features, and interactions?

jfkirk avatar Dec 06 '18 14:12 jfkirk

Hey @jfkirk,

I have the follow shapes:

Shape of item features data: (120, 33) Shape of training user data: (3154, 12) Shape of interaction data: (3154, 120)

gallmerci avatar Dec 06 '18 15:12 gallmerci

Also having this issue, pointing to an issue in the NDCG method:

/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/scipy/sparse/compressed.py:202: RuntimeWarning: invalid value encountered in greater
  res = self._with_data(op(self.data, other), copy=True)
Traceback (most recent call last):
  File "TF_fit_eval.py", line 111, in <module>
    fit_kwargs=fit_kwargs)
  File "~/tensorrec/tensorrec/eval.py", line 160, in fit_and_eval
    n_at_k = ndcg_at_k(predicted_ranks, test_interactions, k=ndcg_k)
  File "~/tensorrec/tensorrec/eval.py", line 108, in ndcg_at_k
    dcg = np.asarray(_dcg(relevance, k_mask, ror_at_k, ranks_of_relevant))[0]
  File "~/tensorrec/tensorrec/eval.py", line 81, in _dcg
    numer = (2**np.multiply(relevance.data, k_mask)) - 1
ValueError: operands could not be broadcast together with shapes (1167,) (95,)

kevglynn avatar Dec 11 '18 17:12 kevglynn

@kevglynn Can you provide some example dataset where this happens? I can't reproduce this error at the moment... I understand in what circumstances this error occurs but I don't understand how this circumstances are possible at all :)

gallmerci avatar Dec 11 '18 20:12 gallmerci

@gallmerci sorry for the delay. It's only happening for me with a specific dataset, which I can't share (proprietary). I will try to come up with an example that reproduces...

kevglynn avatar Dec 16 '18 04:12 kevglynn

Hey all -- any luck with reproducible examples? I'd love to get to the bottom of this, but I've been poking at it and haven't been able to reproduce.

jfkirk avatar Jan 06 '19 17:01 jfkirk