Refactor driver APIs
Currently, we have 2 types of CUDA Driver calls:
__meow(...)and__meowNoThrow(...)
I don't like the design, it forces us to repeat the implementations and forces us to have in/out parameters.
This PR refactors the driver APIs and keeps only 1 definition for each API. Each function is noexcept and returns std::expected-like type that returns the error + return values.
In case that throwing behaviour is demanded, _CCCL_TRY_DRIVER_API macro is provided, which checks the return value of the CUDA call and returns the output parameters. Alternatively, _CCCL_ASSERT_DRIVER_API can be used to assert that a call succeeded.
ok, now I'm confused. In another PR you suggested me to avoid std::span and use raw arrays to keep the implementation minimal. In this PR, we are introducing std::expected which is way more complex.
Non-blocking comments:
- I discussed with @pciolkosz a while back about this, my opinion is that the entire
driver_api.hshould just be auto-generated. We can already do this same thing forcuda-bindings(here), so this is trivial, and we have a plan to expand it further - it has occurred to me that
_CCCLRT_GET_DRIVER_FUNCTIONdoes not cache the retrieved function pointers. There will be a small performance penalty. (It's the reason that we cached them incuda-bindings.)
ok, now I'm confused. In another PR you suggested me to avoid
std::spanand use raw arrays to keep the implementation minimal. In this PR, we are introducingstd::expectedwhich is way more complex.
... returns std::expected-like type ...
I discussed with @pciolkosz a while back about this, my opinion is that the entire
driver_api.hshould just be auto-generated.
I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.
it has occurred to me that
_CCCLRT_GET_DRIVER_FUNCTIONdoes not cache the retrieved function pointers
We store the function addresses in static members inside the functions, _CCCLRT_GET_DRIVER_FUNCTION is called only once per program launch.
I don't think the file to auto generate the header would contain less code, because we still want to adapt the API to our needs.
It is about avoiding hand-writing driver_api.h in a future-proof, safer fashion. The call sites still needs to be hand-written.
Hand-writing bindings are error prone and not scalable as we expand the project scope and coverage.
We store the function addresses in
staticmembers inside the functions,_CCCLRT_GET_DRIVER_FUNCTIONis called only once per program launch.
Function-local static variables are what I missed, thanks!
🥳 CI Workflow Results
🟩 Finished in 2h 22m: Pass: 100%/120 | Total: 1d 09h | Max: 2h 02m | Hits: 92%/230509
See results here.
😬 CI Workflow Results
🟥 Finished in 1h 24m: Pass: 3%/120 | Total: 7h 31m | Max: 44m 45s | Hits: 95%/1944
See results here.
😬 CI Workflow Results
🟥 Finished in 2h 36m: Pass: 52%/120 | Total: 1d 16h | Max: 1h 53m | Hits: 92%/132601
See results here.