spark-ranking-metrics
spark-ranking-metrics copied to clipboard
Why user and item must be ints?
Thanks for a useful toolkit. I agree that the Spark's own ranking metrics are limited.
However, why user and item must be ints? Would "Any" be sufficient? Or use a generic type?
You just need to uniquely identify these. My scenario (IR rather than RecSys), queries (users) are strings, and documents (items) are strings.
Craig
bump? any thoughts?
I totally agree that the IDs should be able to represent Any type or Strings at least, but the current implementation chose Int because of a few technical reasons, but it should be easy to convert the ID types.
The technical reasons being ...
- I wanted to make the API as close as possible to RiVal, in order to validate the correctness
- Spark/JVM operations runs more efficiently with integer fields, especially when forming an array of objects.
- In many applications those entity typically has integer IDs anyway.
Regretfully I no longer have resources to maintain this library, and I think the easiest way is to build a simple mapping between your String IDs and unique integer IDs.