oneDAL icon indicating copy to clipboard operation
oneDAL copied to clipboard

Use user ID and item ID instead of matrix indices for ALS

Open xwu99 opened this issue 3 years ago • 0 comments

Current ALS data input is using row index and column index to index rating in rating matrix. In real use cases, those row index and column index are usually using user ID and item ID and are represented with Integer or String. For example:

Use string as id

“User1”, “Item1”, 1.0 “User2”, “Item2”, 2.0

Or use integer as id

1234, 4567, 1.0 4321, 5678, 2.0

=> Row / column index for oneDAL

0, 1, 1.0 1, 1, 2.0

Other framework such as Spark MLlib ALS will handle this string/integer ID out of box. We need to do an extra data step to map from UserID/ItemID to row index/column index before calling DAL ALS and map back. Is it possible or more efficient to handle this in DAL to remove these extra steps in data preprocessing/postprocessing ?

xwu99 avatar Mar 21 '21 02:03 xwu99