photon-ml icon indicating copy to clipboard operation
photon-ml copied to clipboard

Add Parquet for GAME model training/scoring

Open XianXing opened this issue 8 years ago • 5 comments

It's helpful to support data in the format of Parquet for GAME model training and scoring, which (Parquet) is a first class citizen in Apache Spark.

XianXing avatar Oct 21 '16 22:10 XianXing

Since it seems like this issue is dependent on PR #179, I am curious to know whether we have a rough ETA for #179?

XianXing avatar Oct 21 '16 22:10 XianXing

This one IMHO would be best handled by the community - meaning that at LinkedIn, we are pretty busy with features needed by LinkedIn. We can chaperone the development of a Parquet connector, but there is little probability that we would spend time on it in the near future. So I'm going to label this one Help wanted.

fastier-li avatar Mar 07 '17 22:03 fastier-li

@XianXing 179 was merged in Nov 2016. Moreover, we are going to redo the Driver so that it is more like a "script" calling library functions to prepare the data, the indexes, the normalization contexts... so we are going in that direction to support other data formats (GameEstimator will work off of DataFrame).

fastier-li avatar Mar 16 '17 00:03 fastier-li

Thanks for the update.

XianXing avatar Mar 16 '17 00:03 XianXing

The basic design for this should be a ParquetDataReader that outputs a DataFrame and other data structures needed by GameEstimator.fit. We will keep it as help wanted: it is good to have, but might not handled soon by LinkedIn staff.

fastier-li avatar Mar 16 '17 14:03 fastier-li