yellowbrick
yellowbrick copied to clipboard
Improve performance of RFECV visualizer
This PR fixes #1047 which reported that yellowbrick's internal implementation of RFECV is much slower than scikit-learn's. This PR introduces a new implementation of RFECV that is closer to the latest version of sklearn.RFECV.
I have made the following changes:
- Updated RFECV with new parameters to match the sklearn implementation
- Implemented an
_RFECVthat subclasses the sklearn implementation and adds our required functionality
Sample Code and Plot
If you are adding or modifying a visualizer, PLEASE include a sample plot here along with the code you used to generate it.
TODOs and questions
Still to do:
- [ ] Update the docstrings and parameters of all classes, functions, and quick methods
- [ ] Fix the still failing tests
- [ ] Determine if this is the best path forward and reduces the most technical debt
- [ ] See "TODO" and "HACK" comments in the code
Questions for the @DistrictDataLabs/team-oz-maintainers:
- [ ] Does this require a new minimum version of scikit-learn?
- [ ] Does this require us to add joblib as an optional dependency?
- [ ] Is this maintainable in the long run?
@lwgray @rebeccabilbro and @jc639 -- it would be great if we could work on this together
CHECKLIST
- [ ] Is the commit message formatted correctly?
- [ ] Have you noted the new functionality/bugfix in the release notes of the next release?
- [ ] Included a sample plot to visually illustrate your changes?
- [ ] Do all of your functions and methods have docstrings?
- [ ] Have you added/updated unit tests where appropriate?
- [x] Have you updated the baseline images if necessary?
- [ ] Have you run the unit tests using
pytest? - [ ] Is your code style correct (are you using PEP8, pyflakes)?
- [ ] Have you documented your new feature/functionality in the docs?
- [ ] Have you built the docs using
make html?
Hi, is there are working solution for #1047, that accelerates the RFECV to reach similar performance to the Sklearn implementation?