tpot2 icon indicating copy to clipboard operation
tpot2 copied to clipboard

Doc2

Open perib opened this issue 4 months ago • 1 comments

This PR is in addition to my previous PR and builds off of it. Since this is branched from that PR, this PR includes the previous PR as well.

Various edits, additions, and rewrites to all tutorials. Including rewriting sections of tutorials 3 and 4.

Added doc strings to various classes and functions. Edited existing documentation to make some ideas clearer.

Some code cleanup, including comments, doc strings, and renaming variables.

renamed params/classes:

  • early_stop_seconds -> early_stop_mins
  • SklearnIndividualGenerator -> SearchSpace (This is the base search space class that users will interact with.)
  • GraphPipeline (the search space) -> GraphSearchPipeline. I realized we had two GraphPipeline classes. one for the search space and one for the sklearn estimator.
  • renamed the cv early stopping params to cv pruning to avoid confusion with early stopping on the entire optimization process

fixes including:

  • replaced configspace.add_hyperparameter(s), configspace.add_condition(s) with "configspace.add" (deprecation fix)
  • fixed mdr search space to correctly create the regression version of the pipeline
  • fixed a bug where max_eval_time_mins was being evaluated as seconds instead of minutes.
  • FSSNode will no longer select the current feature set during mutation. mutation will always select a new feature set.
  • fixed a bug in the base individual class that prevented reproducibility. added some additional rng = default_rng(rng) checks just in case.
  • removed unused params from tpotestimator
  • a few fixes to the hyperparameter search spaces. Some tree regressors needed to have their criterion options updated to the correct strings for the latest version.
  • fixed an issue where the gradual version of estimator node was accidentally removing categorical/fixed parameters from the params dictionary.

changes including:

  • replaced "PassKBinsDiscretizer" with "KBinsDiscretizer". The former also passed through the input data, but this is unnecessary now that we have feature unions and the standalone passthrough class
  • Changed a few default values to what should normally be used (initial defaults were mostly about making it faster to run rather than what is likely better).
  • default cv is 10
  • default search space is linear and not linear-light
  • removed memory limit by default
  • added genetic encoders and the other autoqlt transformers/selectos to get configspace
  • added an iterative imputer option tha also simultaneously learns the parameters of the inner estimator
  • removed cross_val_predict_cv from the estimators. use wrapper pipeline instead (or set in graphsearchpipeline).
  • added cross_Val_predict_cv to the template search spaces
  • edited some of the docs like contribute.md, issue template.md,
  • added **kwards to the export_pipeline functions. Functions that return an sklearn pipeline or a graphpipeline now have options for passing a memory parameter and cross_Val_predict_cv. This can be used to pass through TPOT's memory parameter to the final pipelines.
  • added template search spaces for linear-light, graph-light which are lighter/faster versions of linear and graph. added mdr search space

perib avatar Sep 29 '24 15:09 perib