cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

[FEA]: Relevant exceptions for cuCheckpointProcessGetState

Open jricker2 opened this issue 6 months ago • 1 comments

Is this a duplicate?

  • [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.bindings

Is your feature request related to a problem? Please describe.

Very small issue, but not sure if it expands to other functions that I have not tested. For cuCheckpointProcessGetState, sending a PID that doesn't exist or PID not valid to be checkpointed results in the following err:

>>> cu.cuCheckpointProcessGetState(123434)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cuda/bindings/driver.pyx", line 44467, in cuda.bindings.driver.cuCheckpointProcessGetState
  File "/usr/lib64/python3.11/enum.py", line 714, in __call__
    return cls.__new__(cls, value)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/enum.py", line 1137, in __new__
    raise ve_exc
ValueError: 32718 is not a valid CUprocessState

I have seen some other values other than 32718 show up as the returned CUprocessState as well, seemingly random.

Describe the solution you'd like

Consistent exceptions for common failures such as PID not existing or being invalid. The cuda-checkpoint CLI gives the following message which would be fine

Error getting process state for process ID 1234234344: "OS call failed or operation not supported on this OS"

Describe alternatives you've considered

No response

Additional context

Inconsistent/irrelevant exceptions makes unit testing around this area of the cuda driver difficult/messy.

jricker2 avatar Apr 17 '25 21:04 jricker2