Ambrosia icon indicating copy to clipboard operation
Ambrosia copied to clipboard

Implementation of basic PySpark data preprocessing methods

Open xandaau opened this issue 2 years ago • 1 comments

For the tasks of preprocessing pandas data and speeding up experiments, we have the Preprocessor class and a number of base classes with single functionality at preprocessing. These methods should be implemented for spark dataframes, in the same paradigm as we have for the Designer and the Splitter.

At this moment, the implementation of the following methods is essential:

  1. Aggregation
  2. Outliers removal (robust)
  3. CUPED

xandaau avatar Jan 15 '23 16:01 xandaau

Still did not take into account the possibility of PySpark functionality implementation in the architecture of the added preprocessing classes in #22

xandaau avatar Jan 31 '23 15:01 xandaau