spark-ranking-metrics icon indicating copy to clipboard operation
spark-ranking-metrics copied to clipboard

Why user and item must be ints?

Open cmacdonald opened this issue 8 years ago • 2 comments
trafficstars

Thanks for a useful toolkit. I agree that the Spark's own ranking metrics are limited.

However, why user and item must be ints? Would "Any" be sufficient? Or use a generic type?

You just need to uniquely identify these. My scenario (IR rather than RecSys), queries (users) are strings, and documents (items) are strings.

Craig

cmacdonald avatar Oct 15 '17 20:10 cmacdonald

bump? any thoughts?

cmacdonald avatar May 03 '18 14:05 cmacdonald

I totally agree that the IDs should be able to represent Any type or Strings at least, but the current implementation chose Int because of a few technical reasons, but it should be easy to convert the ID types.

The technical reasons being ...

  1. I wanted to make the API as close as possible to RiVal, in order to validate the correctness
  2. Spark/JVM operations runs more efficiently with integer fields, especially when forming an array of objects.
  3. In many applications those entity typically has integer IDs anyway.

Regretfully I no longer have resources to maintain this library, and I think the easiest way is to build a simple mapping between your String IDs and unique integer IDs.

jongwook avatar May 03 '18 14:05 jongwook