cuml
cuml copied to clipboard
Native HPO support and implementation on conda in Windows
Pull Request: [Issue #5380] Enhancements to Hyperparameter Optimization Functions
Description
This PR addresses Issue #5380 by introducing enhancements to the Hyperparameter Optimization (HPO) functions in the cuML module. The goal of these changes is to make it more convenient to work with various Hyperparameter Optimization methods and improve the overall consistency and functionality of the cuML module.
Changes Made
- Introduced a new function
add_sklearn_documentation
to make it easier to import and document the HPO methods from scikit-learn (sklearn). - Added a collection of commonly used HPO methods from
sklearn.model_selection
to the cuML module. - Ensured that the documentation for these imported methods maintains a consistent structure and properly credits the scikit-learn developers.
- Ran the
pre_commit
code to ensure code quality and style standards are met.
If I've inadvertently omitted any HPO methods from sklearn.model_selection
, please don't hesitate to notify me so that we can include them in the cuML module.
Reviewers
cc: @wphicks
P.S.
- I read and followed the project's contribution guidelines
- I have mentioned and referenced Issue #5380 in this PR.
Thank you for your review and feedback on this PR.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
@wphicks Regarding This pull request requires additional validation before any workflows can run on NVIDIA's runners.
What more should I do to make this go away?
I have made my final commit verified by signing with the SSH key, and I have also followed the pre_commit
mentioned in the project's contribution guidelines
@wphicks Regarding
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
What more should I do to make this go away? I have made my final commit verified by signing with the SSH key, and I have also followed the
pre_commit
mentioned in the project's contribution guidelines
@Zekrom-7780 There is nothing you have to do at the moment. We will trigger CI runs once we have addressed some issues with our CI pipeline.
Apologies @Zekrom-7780! This came in while I was on vacation. The changes look very nice! At our internal meeting today, I'll check in with the team about what level of testing we'll want for features included in this way and follow up here afterward.
/ok to test
@wphicks @dantegd , could you tell me where I'm going wrong?
@Zekrom-7780 nothing wrong on your side, the PR needed the changes from #5661 so I just merged branch-23.12
and that should filx things
/ok to test
@wphicks @dantegd , I changed the code, as I saw in the tests that Sklearn didn't have a BayesSearchCV, so I added changed the CVs and added them in these changes
Thanks very much @Zekrom-7780! Could you add some very basic tests demonstrating that these methods work with a cuML estimator? No need to repeat all of the testing that already exists in sklearn; just show that they are functional from the new import location.
Sure @wphicks , I'll add some tests here, but where should I add them? Also can I pick up more issues along with this one if that's fine with you?
@Zekrom-7780 You can add a new file here: https://github.com/rapidsai/cuml/tree/branch-23.12/python/cuml/tests. And of course we'd be delighted if you want to pick up more issues! Thank you so much for this contribution. It looks great, and I can't wait to see what else you'd like to work on. Feel free to ping me directly if you have questions.
@wphicks @dantegd , added Pytests. I dont know like how to write them, so that took some time, but please tell me if I need to add more or different tests.
Thanks, @Zekrom-7780! In terms of tests, I was thinking something that just performs a very basic test of functionality. I.e. You could basically directly borrow from Scikit-Learn here, but rather than using their MockClassifier
, create a nearly-identical MockClassifier
that does inherit from cuML's base estimator class. That should give us good confidence that we won't have any issues with e.g. assumptions related to numpy
instead of cupy
or something like that.
Let me know if you can tackle that or if you need more help in prepping it. We can always sync up on the public RAPIDS Slack channel if you'd like to iron it out together. Really appreciate you taking the time to figure out pytest generally!
Thanks a lot @wphicks , I tried building a class similar to what you suggested to use cupy
instead of numpy
, but I wasn't able to do it.
I've joined the Slack channel, can I reach out to you regarding this?
@Zekrom-7780 Of course! I'm "William Hicks" on the Slack channel.