photon-ml
photon-ml copied to clipboard
Add Parquet for GAME model training/scoring
It's helpful to support data in the format of Parquet for GAME model training and scoring, which (Parquet) is a first class citizen in Apache Spark.
Since it seems like this issue is dependent on PR #179, I am curious to know whether we have a rough ETA for #179?
This one IMHO would be best handled by the community - meaning that at LinkedIn, we are pretty busy with features needed by LinkedIn. We can chaperone the development of a Parquet connector, but there is little probability that we would spend time on it in the near future. So I'm going to label this one Help wanted
.
@XianXing 179 was merged in Nov 2016. Moreover, we are going to redo the Driver
so that it is more like a "script" calling library functions to prepare the data, the indexes, the normalization contexts... so we are going in that direction to support other data formats (GameEstimator
will work off of DataFrame
).
Thanks for the update.
The basic design for this should be a ParquetDataReader
that outputs a DataFrame
and other data structures needed by GameEstimator.fit
. We will keep it as help wanted
: it is good to have, but might not handled soon by LinkedIn staff.