ipex-llm [Discussion] Operations needed to be supported in shards

[Discussion] Operations needed to be supported in shards

Open dding3 opened this issue 2 years ago • 1 comments

To support better user experience to use orca shards, created this issue to discuss which operations are needed to support in orca shards.

[ ] Scaler
- [x] minmaxscaler
- [x] standardscaler https://github.com/intel-analytics/BigDL/pull/5716
[ ] Encode categorical variables
- [x] label encoder
- [ ] onehot encoding (get_dummies in pandas)
[ ] Merge (join) Has a task https://github.com/orgs/analytics-zoo/projects/14/views/4
[ ] Not (~ operation in pandas)
[ ] statisticas
- [ ] missing values - [ ] count missing values for each column - [ ] delete null values - [ ] fill in null values (maybe various imputations)
- [ ] groupby
- [ ] agg
- [ ] mean
- [ ] max
- [ ] sum
- [ ] sort_values (nice to have)

Above operations are motivated from below links: https://www.kaggle.com/code/pmarcelino/comprehensive-data-exploration-with-python operations used: 1. isnull, sum, sort_values, standard_scaler, get_dummies 2. nice to have: describe(summary of dataframe), correlation, arg_sort

https://www.kaggle.com/code/isaienkov/riiid-answer-correctness-prediction-eda-modeling operations used: 1. isnull, sum, groupby, agg, merge, fillna, not 2. nice to have: sklearn.feature_selection.rfe

https://www.kaggle.com/code/ammar111/youtube-trending-videos-analysis operations used: 1. fillna, isna, value_counts, count, filter, groupby, 2. nice to have: describe, most_common, corr, sort_values

https://www.kaggle.com/code/jiashenliu/introduction-to-financial-concepts-and-data operations used: 1. filter, get pd series to np array and using numpy operation to process to create a new column

Sep 08 '22 18:09 dding3

Please summarize for each example, what additional operations are needed

Sep 09 '22 00:09 jason-dai

ipex-llm ipex-llm copied to clipboard

[Discussion] Operations needed to be supported in shards

ipex-llm
ipex-llm copied to clipboard