torcharrow icon indicating copy to clipboard operation
torcharrow copied to clipboard

Supports More Operations for Recommendation Systems

Open Ash-Zheng opened this issue 3 years ago • 1 comments
trafficstars

Hi,

I noticed that some data preprocessing operations used in recommendation systems like bucketize, sigridHash, and firstX are implemented in: torcharrow/tree/main/csrc/velox/functions/rec

I would like to ask if other preprocessing operations for recommendation system be supported in the future? For example, recent paper from Meta[1] mentioned 16 kinds of common preprocessing operations in the Table-11 including: bucketize, sigridHash, firstX, Cartesian, IdListTransform, BoxCox, MapId, and NGram. Most of them are not supported now. Will these operations be supported in torcharrow in the future?

[1] Zhao, Mark, et al. "Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product." Proceedings of the 49th Annual International Symposium on Computer Architecture. 2022.

Ash-Zheng avatar Sep 21 '22 00:09 Ash-Zheng

cc @YLGH

wenleix avatar Oct 04 '22 06:10 wenleix