Surprise icon indicating copy to clipboard operation
Surprise copied to clipboard

Cosine full vector implementation

Open ODemidenko opened this issue 7 years ago • 7 comments

Implemented cosine for full-vector (as was requested - adjusted cosine PR will be made separately, after this one). Tests and docs added.

ODemidenko avatar Feb 05 '18 09:02 ODemidenko

Could you please provide a few benchmark (RMSE, MAE, maybe recall / precision as in FAQ, computation time...)

No, sorry, I don't have time for this measurement. And I added full cosine only to cover all possible options and to prepare common grounds for adjusted cosine (basically, I just add features necessary to pass the "recommender systems" specialization on Coursera, as I have seen complaints there about a lack of Python libs supproting this course).

Regarding the reformatting in the test examples - with previous formatting they were awfully confusing, being improperly aligned (probably, I broke it with my previous PR). This certainly should be fixed.

Other issues fixed

ODemidenko avatar Feb 05 '18 12:02 ODemidenko

No, sorry, I don't have time for this measurement.

I'm sorry but I can't accept a new algorithm / similarity measure without even having a vague idea of its performance.

NicolasHug avatar Feb 05 '18 12:02 NicolasHug

I'm sorry but I can't accept a new algorithm / similarity measure without even having a vague idea of its performance.

For similarity metrics currently you haven't made such measurements yourself, at least I have found only measurements on a single default sim metric. So, probably you would allow to skip it for a new similarity metrics as well?

Otherwise, I kindly ask you to provide detailed requirements on what you are expecting of me.
(it makes sense to add this to "Contributors" instruction as well. In order for possible contributors to understand all your requirements in advance) Regarding your desire to have computation time reported - I have a different machine than yours and this value will be misleading, in comparison to other algorithms. So, I offer to skip this requirement as well.

ODemidenko avatar Feb 05 '18 13:02 ODemidenko

For similarity metrics currently you haven't made such measurements yourself

Ho trust me, I have.

So, probably you would allow to skip it for a new similarity metrics as well?

No, please don't ask me this, really.

Otherwise, I kindly ask you to provide detailed requirements on what you are expecting of me.

I'm not asking for much. A few CV procedures on ml-100k and ml-1m, comparing
performances between only_common_ratings=True and only_common_ratings=False would be a good start.

(it makes sense to add this to "Contributors" instruction as well. In order for possible contributors to understand all your requirements in advance)

Good idea

Regarding your desire to have computation time reported - I have a different machine than yours and this value will be misleading, in comparison to other algorithms.

Absolutely. Benchmarking is about comparing, see my point above.

NicolasHug avatar Feb 05 '18 16:02 NicolasHug

Could you give a link on an existing similarity metrics comparison, as an example of what you are asking for? Or just inform when you will update contributors guideline on this? Probably, you would agree to do this comparison after we add adjusted cosine metrics as well? Because it seems a waste of time to repeat this work twice.

ODemidenko avatar Feb 05 '18 23:02 ODemidenko

I don't have any link. I'm just asking for what I've already described in my previous post.

NicolasHug avatar Feb 08 '18 17:02 NicolasHug

Any update / benchmark on this?

NicolasHug avatar Mar 06 '18 21:03 NicolasHug