FEDOT icon indicating copy to clipboard operation
FEDOT copied to clipboard

Investigate new data operations for feature engineering and ensembling

Open J3FALL opened this issue 3 years ago • 5 comments

  • Add simple Ensembling methods, such as TopModel, WeightedEnsemble and AverageEnsemble
  • Discover the best practises of FE methods for classification/regression of table-like datasets
  • Perform the experiments with expert-based feature engineering as a separate DataOperation blocks
  • Think about special presets of models/operations for classification/regression
  • Try feature generation methods, for instance, like featuretools

J3FALL avatar May 26 '21 14:05 J3FALL

"Importance Cut off" feature selection effectiventess also can be analysed

nicl-nno avatar May 27 '21 09:05 nicl-nno

План действий:

  • [x] 1. Добавить CatBoost, LightGBM + гиперпараметры DONE
  • [x] 2. Посмотреть и составить примерный алгоритм/схему, как происходит feature engineering in LAMA DONE
  • [x] 3. изучить в чем разница pipeline между LAMA - FEDOT
  • [x] 4. Прогнать LAMA на бенчмарках (см. https://github.com/nicl-nno/automlbenchmark/blob/master/frameworks/FEDOT/exec.py) добавить необходимую разницу в FEDOT
  • [ ] 5. научить композер создавать такие (или лучшие пайплайны)

MAGLeb avatar Jul 19 '21 06:07 MAGLeb

На одном датасете были обучены оба фреймфорка. Для обоих фреймворков прогнали обучение по 8 раз и усреднили метрики:

FEDOT

AUC 5 MINUTES train: 0.7995216483735391 test: 0.7141597316576087

10 MINUTES train: 0.8050606503972741 test: 0.7121297554347826

20 MINUTES train: 0.7735015904571719 test: 0.723378269361413

LAMA

AUC ~ 1 minutes [40, 50, ] train: 0.6866954923298692 test: 0.7107557744565218

Для FEDOT необходимо было делать предварительную предобработку даты и категориальных признаков, подробнее .

MAGLeb avatar Jul 21 '21 04:07 MAGLeb

will any auto genetic feature engineering between the multivariable features be added,such as feature1*lag(feature2,10).Since I find there is a genetic algorithm
in fedot. ATOM(https://github.com/tvdboom/ATOM) provide such process by gplearn,however the operators set are very small

graceyangfan avatar Mar 13 '22 10:03 graceyangfan

@https://github.com/graceyangfan

We did not plan to design the features by GA itself. However, we use existing feature generators like poly_features and tune it's hyperparameters during evolution and tuning.

nicl-nno avatar Jul 01 '22 11:07 nicl-nno