Pedro Ribeiro issues

Results 53 issues of


                                            Pedro Ribeiro

Idea for Successive Halving to avoid repeated re-evaluations of the same pipeline.

Currently, successive halving reduces computation by subsampling the number of rows. Every time the budget increases, the existing parent population is re-evaluated at the next budget, which may be inefficient....

enhancement

edit/consolidate default configuration dictionary

**Regressors:** We include ElasticNetCV, LassoLarsCV, SGDRegressor, and RidgeCV. These are all basically linear regression. Let's remove all but SGDRegressor (or maybe ElasticNetCV). SGDRegressor can also potentially overlap with SVR. Or...

enhancement

replace instances of get_pareto_front with get_pareto_frontier

get_pareto_frontier is faster and only computes the best pareto front. (get_pareto_front calculates all the front, not just the best one.)

enhancement

Avoiding repeated work : Cache node models with node hashing or manually.

During the evolutionary algorithm, TPOT2 will fit the same exact data to the same estimator. Ideally, we should be able to catch this and use a cached version of the...

enhancement

add covariate adjustment

Similar to https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528347/ They implemented covariate adjustment with custom sklearn module wrappers, but this is a bit of a hacky workaround that would not be ideal for TPOT2. Instead, this...

enhancement

function that generates a learning curve

It looks at the evaluated_individuals log and generates a plot of time or generations vs. each objective function. This would be helpful for debugging and may provide a visual indication...

enhancement

better plotting for graph pipeline

Graph pipeline plots could be improved. The graphviz package may be worth looking into. other DAG sklearn pipeline packages have nice graphs, for example Baikal and FEDOT

enhancement

better thread efficiency for cv prunning with thresholds

We could have better thread efficiency for early stopping with thresholds (when not used with selection early stopping). Currently, all threads need to come together at the end of each...

enhancement

Memory caching with GraphPipeline may miss some nodes where the ordering on inputs happens to be different between two nodes. - potential solution is to make graphs canonical - there is a C package that can be imported into TPOT that may do this

This should be rare. The order of inputs is determined alphabetically. For this to be an issue we would have to have two modules of the same type, each with...

enhancement

FeatureSetSelector module needs to implement a mask to be consistent with sklearn

see https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectorMixin.html

bug