cccl icon indicating copy to clipboard operation
cccl copied to clipboard

Refactor driver APIs

Open davebayer opened this issue 1 month ago • 5 comments

Currently, we have 2 types of CUDA Driver calls:

  1. __meow(...) and
  2. __meowNoThrow(...)

I don't like the design, it forces us to repeat the implementations and forces us to have in/out parameters.

This PR refactors the driver APIs and keeps only 1 definition for each API. Each function is noexcept and returns std::expected-like type that returns the error + return values.

In case that throwing behaviour is demanded, _CCCL_TRY_DRIVER_API macro is provided, which checks the return value of the CUDA call and returns the output parameters. Alternatively, _CCCL_ASSERT_DRIVER_API can be used to assert that a call succeeded.

davebayer avatar Nov 14 '25 16:11 davebayer

ok, now I'm confused. In another PR you suggested me to avoid std::span and use raw arrays to keep the implementation minimal. In this PR, we are introducing std::expected which is way more complex.

fbusato avatar Nov 14 '25 17:11 fbusato

Non-blocking comments:

  • I discussed with @pciolkosz a while back about this, my opinion is that the entire driver_api.h should just be auto-generated. We can already do this same thing for cuda-bindings (here), so this is trivial, and we have a plan to expand it further
  • it has occurred to me that _CCCLRT_GET_DRIVER_FUNCTION does not cache the retrieved function pointers. There will be a small performance penalty. (It's the reason that we cached them in cuda-bindings.)

leofang avatar Nov 14 '25 17:11 leofang

ok, now I'm confused. In another PR you suggested me to avoid std::span and use raw arrays to keep the implementation minimal. In this PR, we are introducing std::expected which is way more complex.

... returns std::expected-like type ...

I discussed with @pciolkosz a while back about this, my opinion is that the entire driver_api.h should just be auto-generated.

I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.

it has occurred to me that _CCCLRT_GET_DRIVER_FUNCTION does not cache the retrieved function pointers

We store the function addresses in static members inside the functions, _CCCLRT_GET_DRIVER_FUNCTION is called only once per program launch.

davebayer avatar Nov 14 '25 17:11 davebayer

I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.

It is about avoiding hand-writing driver_api.h in a future-proof, safer fashion. The call sites still needs to be hand-written.

Hand-writing bindings are error prone and not scalable as we expand the project scope and coverage.

We store the function addresses in static members inside the functions, _CCCLRT_GET_DRIVER_FUNCTION is called only once per program launch.

Function-local static variables are what I missed, thanks!

leofang avatar Nov 14 '25 18:11 leofang

🥳 CI Workflow Results

🟩 Finished in 2h 22m: Pass: 100%/120 | Total: 1d 09h | Max: 2h 02m | Hits: 92%/230509

See results here.

github-actions[bot] avatar Nov 14 '25 20:11 github-actions[bot]

😬 CI Workflow Results

🟥 Finished in 1h 24m: Pass: 3%/120 | Total: 7h 31m | Max: 44m 45s | Hits: 95%/1944

See results here.

github-actions[bot] avatar Nov 18 '25 13:11 github-actions[bot]

😬 CI Workflow Results

🟥 Finished in 2h 36m: Pass: 52%/120 | Total: 1d 16h | Max: 1h 53m | Hits: 92%/132601

See results here.

github-actions[bot] avatar Nov 18 '25 17:11 github-actions[bot]