oneDAL
oneDAL copied to clipboard
Use user ID and item ID instead of matrix indices for ALS
Current ALS data input is using row index and column index to index rating in rating matrix. In real use cases, those row index and column index are usually using user ID and item ID and are represented with Integer or String. For example:
Use string as id
“User1”, “Item1”, 1.0 “User2”, “Item2”, 2.0
Or use integer as id
1234, 4567, 1.0 4321, 5678, 2.0
=> Row / column index for oneDAL
0, 1, 1.0 1, 1, 2.0
Other framework such as Spark MLlib ALS will handle this string/integer ID out of box. We need to do an extra data step to map from UserID/ItemID to row index/column index before calling DAL ALS and map back. Is it possible or more efficient to handle this in DAL to remove these extra steps in data preprocessing/postprocessing ?