tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Missing feature transformers and shallow pipelines

Open Genises opened this issue 6 years ago • 2 comments

Hello, I want to use TPOT for feature engineering. Therefore, I chose a fixed model for TPOT like a linear regression model and the default configuration.

Having some features {x1,x2,…}, there are no feature transformation steps/operators in TPOT that could produce new features such as 5 * (x2 + log(x1))**3 or even just abs(x1 - x2), right?

Testing TPOT on synthetic data (where I know the target function) often results in many more and seemingly overly complex features. E.g. produced by a single RBFSampler operator and such.

Also, even if such non-linear feature transformation operators (|x|, exp(x), sin(x), cos(x), abs(x)) together with combination operators (+, −, ·) were part of TPOT, could a feature like 5 * (x2 + log(x1))**3 even be constructed? All my initial pipelines are very shallow and due to the multi objective optimisation and greedy evolutionary approach do not get bigger. What is with scenarios where multiple operators would need to be introduced at the same time to improve accuracy and to be part of the Pareto front?

Genises avatar Mar 10 '20 16:03 Genises

Those complex combinations of feature transformation were not supported in TPOT. I think ColumnTransformer is needed for this idea in this issue. Also, as mentioned above, because multi-object optimization should penalize the pipelines with a large number of operators and limited improvement in scores, selection function or calculation of pipeline complexity should be changed for this issue. But we don't have a near plan to include this function. Any contributions are welcome.

weixuanfu avatar Mar 10 '20 17:03 weixuanfu

Those complex combinations of feature transformation were not supported in TPOT. I think ColumnTransformer is needed for this idea in this issue. Also, as mentioned above, because multi-object optimization should penalize the pipelines with a large number of operators and limited improvement in scores, selection function or calculation of pipeline complexity should be changed for this issue. But we don't have a near plan to include this function. Any contributions are welcome.

need complex feature transformation on my current project, I can use ColumnTransformer in sklearn, but it seems not supported in tpot yet.

really need it!

ruialcn avatar Jul 03 '22 16:07 ruialcn