oap-mllib icon indicating copy to clipboard operation
oap-mllib copied to clipboard

[ALS] Use user ID and item ID instead of matrix indices for ALS

Open xwu99 opened this issue 3 years ago • 0 comments

Current ALS data input is using row index and column index to index rating in rating matrix. In real use cases, those row index and column index are usually using user ID and item ID and are represented with Integer or String. For example:

Use string as id

“User1”, “Item1”, 1.0 “User2”, “Item2”, 2.0

Or use integer as id

1234, 4567, 1.0 4321, 5678, 2.0

=> Row / column index for oneDAL

0, 1, 1.0 1, 1, 2.0

Other framework such as Spark MLlib ALS will handle this string/integer ID out of box. We need to do an extra data step to map from UserID/ItemID to row index/column index before calling DAL ALS and map back.

Also refer to https://github.com/oneapi-src/oneDAL/issues/1514

xwu99 avatar May 10 '21 12:05 xwu99