yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Improve performance of RFECV visualizer

Open bbengfort opened this issue 5 years ago • 1 comments

This PR fixes #1047 which reported that yellowbrick's internal implementation of RFECV is much slower than scikit-learn's. This PR introduces a new implementation of RFECV that is closer to the latest version of sklearn.RFECV.

I have made the following changes:

  1. Updated RFECV with new parameters to match the sklearn implementation
  2. Implemented an _RFECV that subclasses the sklearn implementation and adds our required functionality

Sample Code and Plot

If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.

TODOs and questions

Still to do:

  • [ ] Update the docstrings and parameters of all classes, functions, and quick methods
  • [ ] Fix the still failing tests
  • [ ] Determine if this is the best path forward and reduces the most technical debt
  • [ ] See "TODO" and "HACK" comments in the code

Questions for the @DistrictDataLabs/team-oz-maintainers:

  • [ ] Does this require a new minimum version of scikit-learn?
  • [ ] Does this require us to add joblib as an optional dependency?
  • [ ] Is this maintainable in the long run?

@lwgray @rebeccabilbro and @jc639 -- it would be great if we could work on this together

CHECKLIST

  • [ ] Is the commit message formatted correctly?
  • [ ] Have you noted the new functionality/bugfix in the release notes of the next release?
  • [ ] Included a sample plot to visually illustrate your changes?
  • [ ] Do all of your functions and methods have docstrings?
  • [ ] Have you added/updated unit tests where appropriate?
  • [x] Have you updated the baseline images if necessary?
  • [ ] Have you run the unit tests using pytest?
  • [ ] Is your code style correct (are you using PEP8, pyflakes)?
  • [ ] Have you documented your new feature/functionality in the docs?
  • [ ] Have you built the docs using make html?

bbengfort avatar Mar 10 '20 13:03 bbengfort

Hi, is there are working solution for #1047, that accelerates the RFECV to reach similar performance to the Sklearn implementation?

fab375 avatar Dec 23 '20 08:12 fab375