Pedro Ribeiro
Pedro Ribeiro
Currently, successive halving reduces computation by subsampling the number of rows. Every time the budget increases, the existing parent population is re-evaluated at the next budget, which may be inefficient....
**Regressors:** We include ElasticNetCV, LassoLarsCV, SGDRegressor, and RidgeCV. These are all basically linear regression. Let's remove all but SGDRegressor (or maybe ElasticNetCV). SGDRegressor can also potentially overlap with SVR. Or...
get_pareto_frontier is faster and only computes the best pareto front. (get_pareto_front calculates all the front, not just the best one.)
During the evolutionary algorithm, TPOT2 will fit the same exact data to the same estimator. Ideally, we should be able to catch this and use a cached version of the...
Similar to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528347/ They implemented covariate adjustment with custom sklearn module wrappers, but this is a bit of a hacky workaround that would not be ideal for TPOT2. Instead, this...
It looks at the evaluated_individuals log and generates a plot of time or generations vs. each objective function. This would be helpful for debugging and may provide a visual indication...
Graph pipeline plots could be improved. The graphviz package may be worth looking into. other DAG sklearn pipeline packages have nice graphs, for example Baikal and FEDOT
We could have better thread efficiency for early stopping with thresholds (when not used with selection early stopping). Currently, all threads need to come together at the end of each...
This should be rare. The order of inputs is determined alphabetically. For this to be an issue we would have to have two modules of the same type, each with...
see https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectorMixin.html