cvxpy icon indicating copy to clipboard operation
cvxpy copied to clipboard

integrate cuOpt solver into cvxpy

Open tmckayus opened this issue 5 months ago • 11 comments

This change integrates the cuOpt solver with cvxpy.

cuOpt is now open-source and can be found at https://github.com/NVIDIA/cuopt

The integration supports a locally installed cuopt instance by default. It also supports use of the cuopt REST service if a cuopt instance is located on another machine. This can be useful when a GPU is not available on the host running cvxpy, but is available on a remote host with cuopt installed.

tmckayus avatar Jun 04 '25 18:06 tmckayus

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jun 04 '25 18:06 CLAassistant

Will you be able to sign the CLA? I see NVIDIA copyright notices in your code.

PTNobel avatar Jun 04 '25 19:06 PTNobel

Will you be able to sign the CLA? I see NVIDIA copyright notices in your code.

Waiting for approval to sign currently ... I searched for a CLA before I submitted but missed it, so I went back for additional approval.

tmckayus avatar Jun 04 '25 19:06 tmckayus

Will you be able to sign the CLA? I see NVIDIA copyright notices in your code.

Waiting for approval to sign currently ... I searched for a CLA before I submitted but missed it, so I went back for additional approval.

@PTNobel question from compliance group "Do they [cvxpy] have a corporate CLA?"

tmckayus avatar Jun 05 '25 13:06 tmckayus

overall looks good (apart from some linting issues)! I'll let @PTNobel do a more thorough review on this.

@tmckayus is the current state for testing to not have any tests in cvxpy? (or atleast skipping them because cuopt won't be installed)? We will be adding other GPU solvers so we can re-visit this at some point.. it might be possible to get a GPU in a container in the CI or something like that.

Hi @Transurgeon with the 25.05 release, cuOpt is open-source and can be installed via pip or conda so we could easily get it on a test runner machine. The GPU is the harder part.

Yes, we could run nightly cvxpy tests on internal infra. We'd have to figure out what the test suite should be beyond the unit tests I already included, if we need anything else

tmckayus avatar Jun 05 '25 13:06 tmckayus

I put these questions/notes on the associated issue but I should probably repost them here as well :)

I included support for using the cuopt service on a remote host, although this is not the default. If only the client is installed, or if the CUOPT_USE_SERVICE env var is set to True, cvxpy will use a client to submit to cuopt remotely. If this is an anti-pattern for cvxpy, we can remove the service support.

I was unsure how to handle the citation, since there is no one particular author and the project has been around for a while. I based it on the GLOP citation, and set a date of 5-29 since that is when the code was made public.

We probably need some documentation on cuopt settings. Not sure where to include solver-specific docs in cvxpy

tmckayus avatar Jun 05 '25 13:06 tmckayus

@Transurgeon @PTNobel

I likely have found a problem with our formulation for MIP, based on miplib benchmark, so I converted this to a draft. It may take me a while to get back to this but it is high on my list. Will update when I have some more info.

tmckayus avatar Jun 05 '25 19:06 tmckayus

@PTNobel question from compliance group "Do they [cvxpy] have a corporate CLA?"

We don't as far as I recall. I don't think anyone's asked before.

I included support for using the cuopt service on a remote host, although this is not the default. If only the client is installed, or if the CUOPT_USE_SERVICE env var is set to True, cvxpy will use a client to submit to cuopt remotely. If this is an anti-pattern for cvxpy, we can remove the service support.

I'm happy to have service support, but I'd prefer to see it as a parameter to the .solve call as a third (and most user-friendly) option. You can use solver_opts.pop('use_service', False) in solve_via_data to get that.

I was unsure how to handle the citation, since there is no one particular author and the project has been around for a while. I based it on the GLOP citation, and set a date of 5-29 since that is when the code was made public.

If you're not interested in academic citations, it doesn't really matter. But this sounds fine.

We probably need some documentation on cuopt settings. Not sure where to include solver-specific docs in cvxpy

See doc/souce/tutorial/solvers/index.rst You should add CUOPT to both the big table and add a infobox under Setting solver options.

PTNobel avatar Jun 07 '25 05:06 PTNobel

Yes, we could run nightly cvxpy tests on internal infra. We'd have to figure out what the test suite should be beyond the unit tests I already included, if we need anything else

Nightly is more frequently than we merge changes :laughing: Would it be possible to get us access to a CI runner with a GPU? In the past week we've run our CI 11 times in the last week. We also have two other GPU solvers, and it would be nice to run the test suites for all three at once. The other two are MPAX and CuClarabel (in-progress: #2812), both academic projects.

PTNobel avatar Jun 07 '25 05:06 PTNobel

So I saw the following in your README:

We anticipate that the Python API will change significantly in the future. Use it at your own risk.

Which makes me think we need to have a good testing plan before merging. I'd appreciate any insights you can share on what's coming up in the future for APIs (and a prevolunteering to help with breakages would be loved even more).

PTNobel avatar Jun 07 '25 05:06 PTNobel

I'll do a review once you've marked this PR for merging!

PTNobel avatar Jun 07 '25 05:06 PTNobel

@tmckayus can we get some version of cuOPT integrated in the next few weeks? We are working on a 1.7 release and would like to have a cuOPT interface. If MILPs don't work we can still have a LP interface.

SteveDiamond avatar Jun 23 '25 16:06 SteveDiamond

@tmckayus can we get some version of cuOPT integrated in the next few weeks? We are working on a 1.7 release and would like to have a cuOPT interface. If MILPs don't work we can still have a LP interface.

Yes! I was gone on PTO for a while but I'm back, and I've convinced myself that the error I was seeing in MIP was in my driver, not in the core cuopt module included here. Moving back from Draft.

Thanks for all the feedback above, I will follow up ....

tmckayus avatar Jun 25 '25 19:06 tmckayus

So I saw the following in your README:

We anticipate that the Python API will change significantly in the future. Use it at your own risk.

Which makes me think we need to have a good testing plan before merging. I'd appreciate any insights you can share on what's coming up in the future for APIs (and a prevolunteering to help with breakages would be loved even more).

Absolutely we pre-volunteer to keep this in sync. We would not modify the API without fixing this up.

As for the roadmap, that is unclear at this point. That statement in the docs is meant to direct users to the service API, the new C API, or actually modeler integrations like this one! We plan on revamping the LP/MIP Python API at some point to be closer to existing API standards, but I'm not sure when that may happen. We just don't want users building big integrations around the current API without being warned. Anything we are supporting ourselves however is fair game to use the API.

tmckayus avatar Jun 25 '25 19:06 tmckayus

Yes, we could run nightly cvxpy tests on internal infra. We'd have to figure out what the test suite should be beyond the unit tests I already included, if we need anything else

Nightly is more frequently than we merge changes 😆 Would it be possible to get us access to a CI runner with a GPU? In the past week we've run our CI 11 times in the last week. We also have two other GPU solvers, and it would be nice to run the test suites for all three at once. The other two are MPAX and CuClarabel (in-progress: #2812), both academic projects.

I can ask about this

tmckayus avatar Jun 25 '25 19:06 tmckayus

Also, can we add some docs about to how to pick what GPU is being used?

PTNobel avatar Jun 25 '25 22:06 PTNobel

Yes, we could run nightly cvxpy tests on internal infra. We'd have to figure out what the test suite should be beyond the unit tests I already included, if we need anything else

Nightly is more frequently than we merge changes 😆 Would it be possible to get us access to a CI runner with a GPU? In the past week we've run our CI 11 times in the last week. We also have two other GPU solvers, and it would be nice to run the test suites for all three at once. The other two are MPAX and CuClarabel (in-progress: #2812), both academic projects.

I can ask about this

The short answer is "we don't currently support runners for external projects". There is possibly funding we could apply for, which might provide an ultimate solution, but that won't help us in the short term. I think we're going to have to agree on a test suite and run it internally for now.

tmckayus avatar Jun 26 '25 14:06 tmckayus

I think we're going to have to agree on a test suite and run it internally for now.

That sounds good. Would you be open to running all the CVXPY GPU solvers? The rest are academic projects; I could put together a test_gpu_solvers.py in our test suite and write a Bash script in our continuous_integration folder next week that would be a good candidate to run all the GPU solvers.

PTNobel avatar Jun 26 '25 18:06 PTNobel

update on the CLA: I need to fill out some internal info, and then we need a VP to sign the CLA as the "legal entity authorized by the copyright owner" with a note that the VP is representing developers at NVIDIA as a whole. Does this work for you?

tmckayus avatar Jul 01 '25 19:07 tmckayus

update on the CLA: I need to fill out some internal info, and then we need a VP to sign the CLA as the "legal entity authorized by the copyright owner" with a note that the VP is representing developers at NVIDIA as a whole. Does this work for you?

Looks like the text at the top of the CLA says "please complete and send a signed CLA by scanning it and emailing it to: [email protected]."

If we can email you a scanned CLA, then we can also add a sentence that says the VP is signing for all NVIDIA developers. Does this work? Asking because the text mentions emailing but there is also a button to sign online. I want to make sure emailing is still okay.

@PTNobel

tmckayus avatar Jul 01 '25 20:07 tmckayus

"legal entity authorized by the copyright owner" with a note that the VP is representing developers at NVIDIA as a whole. Does this work for you?

That works for us!

If we can email you a scanned CLA, then we can also add a sentence that says the VP is signing for all NVIDIA developers. Does this work? Asking because the text mentions emailing but there is also a button to sign online. I want to make sure emailing is still okay.

That would be perfect. Thank you!

PTNobel avatar Jul 02 '25 23:07 PTNobel

"legal entity authorized by the copyright owner" with a note that the VP is representing developers at NVIDIA as a whole. Does this work for you?

That works for us!

If we can email you a scanned CLA, then we can also add a sentence that says the VP is signing for all NVIDIA developers. Does this work? Asking because the text mentions emailing but there is also a button to sign online. I want to make sure emailing is still okay.

That would be perfect. Thank you!

Excellent! I will follow up with this immediately. Also working on the changes requested above in parallell. This is my P0 currently so hopefully we will have something merge-able soon.

tmckayus avatar Jul 03 '25 13:07 tmckayus

I think we're going to have to agree on a test suite and run it internally for now.

That sounds good. Would you be open to running all the CVXPY GPU solvers? The rest are academic projects; I could put together a test_gpu_solvers.py in our test suite and write a Bash script in our continuous_integration folder next week that would be a good candidate to run all the GPU solvers.

The first question here is: are there other solvers that use CUDA? Only solvers written against the CUDA library will run on NVIDIA GPUs.

tmckayus avatar Jul 03 '25 18:07 tmckayus

The first question here is: are there other solvers that use CUDA? Only solvers written against the CUDA library will run on NVIDIA GPUs.

@PTNobel and @ZedongPeng recently added a solver interface for MPAX which uses the device configured by JAX. I assume that it would be possible to specify CUDA as the GPU device.

Transurgeon avatar Jul 03 '25 18:07 Transurgeon

The first question here is: are there other solvers that use CUDA? Only solvers written against the CUDA library will run on NVIDIA GPUs.

We have 2 solvers that link against cuDSS (SCS and CuClarabel.jl) and one that's JAX. The two cuDSS solvers only work with CUDA. JAX can easily be tested on the CPU, but I'd prefer to test it with CUDA since that's what most people use it with.

PTNobel avatar Jul 03 '25 19:07 PTNobel

The first question here is: are there other solvers that use CUDA? Only solvers written against the CUDA library will run on NVIDIA GPUs.

We have 2 solvers that link against cuDSS (SCS and CuClarabel.jl) and one that's JAX. The two cuDSS solvers only work with CUDA. JAX can easily be tested on the CPU, but I'd prefer to test it with CUDA since that's what most people use it with.

Asked/answered: we have limited resources for now and can't commit to additional testing beyond the cuopt integration.

tmckayus avatar Jul 07 '25 13:07 tmckayus

Update: I pushed changes based on the feedback, I still have to add the documentation piece (and fix the conflicts). I will work on this today. What is your target release date? Continuing to pursue the CLA as well.

Also, I added the pre-commit but I'm not sure it had any effect. Will check the contributing doc ....

tmckayus avatar Jul 07 '25 14:07 tmckayus

Update: I pushed changes based on the feedback, I still have to add the documentation piece (and fix the conflicts). I will work on this today. What is your target release date? Continuing to pursue the CLA as well.

Our target release date is July 16th, 2025. A few days before presenting at the ICCOPT conference (July 20th).

Transurgeon avatar Jul 08 '25 17:07 Transurgeon

Benchmarks that have stayed the same:

   before           after         ratio
 [d6bf54de]       [d8f950e1]
      229±0ms          236±0ms     1.03  gini_portfolio.Murray.time_compile_problem
      2.75±0s          2.83±0s     1.03  quantum_hilbert_matrix.QuantumHilbertMatrix.time_compile_problem
      821±0ms          836±0ms     1.02  simple_QP_benchmarks.LeastSquares.time_compile_problem
     43.3±0ms         44.1±0ms     1.02  matrix_stuffing.SmallMatrixStuffing.time_compile_problem
      1.35±0s          1.37±0s     1.02  matrix_stuffing.ParamConeMatrixStuffing.time_compile_problem
      707±0ms          719±0ms     1.02  matrix_stuffing.ConeMatrixStuffingBench.time_compile_problem
      1.62±0s          1.65±0s     1.02  tv_inpainting.TvInpainting.time_compile_problem
      4.95±0s          5.02±0s     1.01  optimal_advertising.OptimalAdvertising.time_compile_problem
      282±0ms          286±0ms     1.01  matrix_stuffing.ParamSmallMatrixStuffing.time_compile_problem
      4.50±0s          4.55±0s     1.01  huber_regression.HuberRegression.time_compile_problem
      11.1±0s          11.2±0s     1.01  simple_LP_benchmarks.SimpleLPBenchmark.time_compile_problem
      238±0ms          240±0ms     1.01  high_dim_convex_plasticity.ConvexPlasticity.time_compile_problem
      918±0ms          921±0ms     1.00  simple_LP_benchmarks.SimpleScalarParametrizedLPBenchmark.time_compile_problem
      12.8±0s          12.8±0s     1.00  finance.CVaRBenchmark.time_compile_problem
      21.8±0s          21.9±0s     1.00  sdp_segfault_1132_benchmark.SDPSegfault1132Benchmark.time_compile_problem
      2.35±0s          2.35±0s     1.00  simple_LP_benchmarks.SimpleFullyParametrizedLPBenchmark.time_compile_problem
      1.84±0s          1.84±0s     1.00  simple_QP_benchmarks.UnconstrainedQP.time_compile_problem
      288±0ms          289±0ms     1.00  slow_pruning_1668_benchmark.SlowPruningBenchmark.time_compile_problem
      5.30±0s          5.30±0s     1.00  semidefinite_programming.SemidefiniteProgramming.time_compile_problem
      5.09±0s          5.07±0s     1.00  svm_l1_regularization.SVMWithL1Regularization.time_compile_problem
      559±0ms          557±0ms     1.00  simple_QP_benchmarks.ParametrizedQPBenchmark.time_compile_problem
      339±0ms          337±0ms     0.99  gini_portfolio.Yitzhaki.time_compile_problem
      250±0ms          248±0ms     0.99  simple_QP_benchmarks.SimpleQPBenchmark.time_compile_problem
      1.08±0s          1.06±0s     0.99  finance.FactorCovarianceModel.time_compile_problem
      1.10±0s          1.05±0s     0.95  gini_portfolio.Cajas.time_compile_problem

github-actions[bot] avatar Jul 09 '25 06:07 github-actions[bot]

Okay, I think that's it! I will mail the signed CLA as soon as I have it, and keep monitoring this PR for additional feedback. Thank you all for the support!


okay addressed feedback and documented options

last thing is to add install/index.rst

tmckayus avatar Jul 09 '25 16:07 tmckayus