cuml icon indicating copy to clipboard operation
cuml copied to clipboard

Native HPO support and implementation on conda in Windows

Open Zekrom-7780 opened this issue 1 year ago • 16 comments

Pull Request: [Issue #5380] Enhancements to Hyperparameter Optimization Functions

Description

This PR addresses Issue #5380 by introducing enhancements to the Hyperparameter Optimization (HPO) functions in the cuML module. The goal of these changes is to make it more convenient to work with various Hyperparameter Optimization methods and improve the overall consistency and functionality of the cuML module.

Changes Made

  • Introduced a new function add_sklearn_documentation to make it easier to import and document the HPO methods from scikit-learn (sklearn).
  • Added a collection of commonly used HPO methods from sklearn.model_selection to the cuML module.
  • Ensured that the documentation for these imported methods maintains a consistent structure and properly credits the scikit-learn developers.
  • Ran the pre_commit code to ensure code quality and style standards are met.

If I've inadvertently omitted any HPO methods from sklearn.model_selection, please don't hesitate to notify me so that we can include them in the cuML module.

Reviewers

cc: @wphicks

P.S.

  • I read and followed the project's contribution guidelines
  • I have mentioned and referenced Issue #5380 in this PR.

Thank you for your review and feedback on this PR.

Zekrom-7780 avatar Nov 03 '23 11:11 Zekrom-7780

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Nov 03 '23 11:11 copy-pr-bot[bot]

@wphicks Regarding This pull request requires additional validation before any workflows can run on NVIDIA's runners.

What more should I do to make this go away? I have made my final commit verified by signing with the SSH key, and I have also followed the pre_commit mentioned in the project's contribution guidelines

Zekrom-7780 avatar Nov 07 '23 19:11 Zekrom-7780

@wphicks Regarding This pull request requires additional validation before any workflows can run on NVIDIA's runners.

What more should I do to make this go away? I have made my final commit verified by signing with the SSH key, and I have also followed the pre_commit mentioned in the project's contribution guidelines

@Zekrom-7780 There is nothing you have to do at the moment. We will trigger CI runs once we have addressed some issues with our CI pipeline.

csadorf avatar Nov 08 '23 14:11 csadorf

Apologies @Zekrom-7780! This came in while I was on vacation. The changes look very nice! At our internal meeting today, I'll check in with the team about what level of testing we'll want for features included in this way and follow up here afterward.

wphicks avatar Nov 16 '23 15:11 wphicks

/ok to test

dantegd avatar Nov 20 '23 17:11 dantegd

@wphicks @dantegd , could you tell me where I'm going wrong?

Zekrom-7780 avatar Nov 20 '23 23:11 Zekrom-7780

@Zekrom-7780 nothing wrong on your side, the PR needed the changes from #5661 so I just merged branch-23.12 and that should filx things

dantegd avatar Nov 21 '23 15:11 dantegd

/ok to test

dantegd avatar Nov 21 '23 15:11 dantegd

@wphicks @dantegd , I changed the code, as I saw in the tests that Sklearn didn't have a BayesSearchCV, so I added changed the CVs and added them in these changes

Zekrom-7780 avatar Nov 21 '23 16:11 Zekrom-7780

Thanks very much @Zekrom-7780! Could you add some very basic tests demonstrating that these methods work with a cuML estimator? No need to repeat all of the testing that already exists in sklearn; just show that they are functional from the new import location.

wphicks avatar Nov 21 '23 22:11 wphicks

Sure @wphicks , I'll add some tests here, but where should I add them? Also can I pick up more issues along with this one if that's fine with you?

Zekrom-7780 avatar Nov 21 '23 23:11 Zekrom-7780

@Zekrom-7780 You can add a new file here: https://github.com/rapidsai/cuml/tree/branch-23.12/python/cuml/tests. And of course we'd be delighted if you want to pick up more issues! Thank you so much for this contribution. It looks great, and I can't wait to see what else you'd like to work on. Feel free to ping me directly if you have questions.

wphicks avatar Nov 22 '23 14:11 wphicks

@wphicks @dantegd , added Pytests. I dont know like how to write them, so that took some time, but please tell me if I need to add more or different tests.

Zekrom-7780 avatar Nov 26 '23 19:11 Zekrom-7780

Thanks, @Zekrom-7780! In terms of tests, I was thinking something that just performs a very basic test of functionality. I.e. You could basically directly borrow from Scikit-Learn here, but rather than using their MockClassifier, create a nearly-identical MockClassifier that does inherit from cuML's base estimator class. That should give us good confidence that we won't have any issues with e.g. assumptions related to numpy instead of cupy or something like that.

Let me know if you can tackle that or if you need more help in prepping it. We can always sync up on the public RAPIDS Slack channel if you'd like to iron it out together. Really appreciate you taking the time to figure out pytest generally!

wphicks avatar Nov 27 '23 14:11 wphicks

Thanks a lot @wphicks , I tried building a class similar to what you suggested to use cupy instead of numpy, but I wasn't able to do it. I've joined the Slack channel, can I reach out to you regarding this?

Zekrom-7780 avatar Nov 30 '23 01:11 Zekrom-7780

@Zekrom-7780 Of course! I'm "William Hicks" on the Slack channel.

wphicks avatar Nov 30 '23 03:11 wphicks