ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

[Discussion] Operations needed to be supported in shards

Open dding3 opened this issue 2 years ago • 1 comments

To support better user experience to use orca shards, created this issue to discuss which operations are needed to support in orca shards.

  • [ ] Scaler

    • [x] minmaxscaler
    • [x] standardscaler https://github.com/intel-analytics/BigDL/pull/5716
  • [ ] Encode categorical variables

    • [x] label encoder
    • [ ] onehot encoding (get_dummies in pandas)
  • [ ] Merge (join) Has a task https://github.com/orgs/analytics-zoo/projects/14/views/4

  • [ ] Not (~ operation in pandas)

  • [ ] statisticas

    • [ ] missing values - [ ] count missing values for each column - [ ] delete null values - [ ] fill in null values (maybe various imputations)
    • [ ] groupby
    • [ ] agg
    • [ ] mean
    • [ ] max
    • [ ] sum
    • [ ] sort_values (nice to have)

Above operations are motivated from below links: https://www.kaggle.com/code/pmarcelino/comprehensive-data-exploration-with-python operations used: 1. isnull, sum, sort_values, standard_scaler, get_dummies 2. nice to have: describe(summary of dataframe), correlation, arg_sort

https://www.kaggle.com/code/isaienkov/riiid-answer-correctness-prediction-eda-modeling operations used: 1. isnull, sum, groupby, agg, merge, fillna, not 2. nice to have: sklearn.feature_selection.rfe

https://www.kaggle.com/code/ammar111/youtube-trending-videos-analysis operations used: 1. fillna, isna, value_counts, count, filter, groupby, 2. nice to have: describe, most_common, corr, sort_values

https://www.kaggle.com/code/jiashenliu/introduction-to-financial-concepts-and-data operations used: 1. filter, get pd series to np array and using numpy operation to process to create a new column

dding3 avatar Sep 08 '22 18:09 dding3

Please summarize for each example, what additional operations are needed

jason-dai avatar Sep 09 '22 00:09 jason-dai