tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Add algorithm options of KNeighborRegressor/Classifier; Add SelectFdr to selectors

Open KirkDCO opened this issue 9 months ago • 0 comments

What does this PR do?

  1. When KNeighborRegressor or KNeighborClassifier are run with a high-dimensional dataset, memory usage spikes and eventually caused an out of memory error and failure. Added an algorithm option with ["kd_tree", and "ball_tree"] prevent this from happening.
  2. Added sklearn.SelectFdr as feature selector to allow feature selection by False Discovery Rate.

Where should the reviewer start?

config/regressor.py config/classifier.py

How should this PR be tested?

Existing KNeighborRegressor tests should pass.
Adding an option to select by FDR.
Both have been tested in my hands extensively without any problems.

Any background context you want to provide?

When running nosetests with scikit-learn 1.4.2, export tests fail due to a deprecated function usage which is not related to this PR. All other tests pass.

What are the relevant issues?

[you can link directly to issues by entering # then the number of the issue, for example, #3 links to issue 3]

Screenshots (if appropriate)

Questions:

  • Do the docs need to be updated?

Addition of selectFdr to documentation Maybe addition of algorithm details for KNeighborRegressor and KNeighborClassifier

  • Does this PR add new (Python) dependencies?

If this is deemed an acceptable contribution, I'm happy to update other code in the config directory.

KirkDCO avatar May 15 '24 17:05 KirkDCO