Ambrosia Implementation of basic PySpark data preprocessing methods

Implementation of basic PySpark data preprocessing methods

Open xandaau opened this issue 2 years ago • 1 comments

For the tasks of preprocessing pandas data and speeding up experiments, we have the Preprocessor class and a number of base classes with single functionality at preprocessing. These methods should be implemented for spark dataframes, in the same paradigm as we have for the Designer and the Splitter.

At this moment, the implementation of the following methods is essential:

Aggregation
Outliers removal (robust)
CUPED

Jan 15 '23 16:01 xandaau

Still did not take into account the possibility of PySpark functionality implementation in the architecture of the added preprocessing classes in #22

Jan 31 '23 15:01 xandaau

Ambrosia Ambrosia copied to clipboard

Implementation of basic PySpark data preprocessing methods

Ambrosia
Ambrosia copied to clipboard