ipex-llm
ipex-llm copied to clipboard
Better user experience for XShards of Pandas Dataframe input for Orca Estimators
Problem
For XShards of Pandas Dataframe, it is more common for several columns together serve as one model input. In current feature_cols
context, each column serves as one model input. It could be cumbersome for users to convert their original pandas dataframe feature columns to one column, and make each cell contains a list or a ndarray.
Design (open to discussion)
We could use different meanings of feature_cols
for Spark Dataframe and XShards of Pandas DataFrame.
- For Spark Dataframe, each feature column should be one input to the model;
- For XShards of Pandas Dataframe, each feature_column could be one feature and we will internally concatenate the feature columns together as one input before feeding into the model. E.g If
feature_cols = ['f1", "f2", "f3", "f4"]
, the model should expect an input with shape of (batch_size, 4); Iffeature_cols = [['f1", "f2"], ["f3", "f4"]]
, the model should expect two inputs, each with shape of (batch_size, 2)
Related issues
#5060 https://github.com/intel-analytics/BigDL/issues/4965
is this issue related: https://github.com/intel-analytics/BigDL/issues/4448?
See https://github.com/intel-analytics/BigDL/issues/4965#issuecomment-1184515330