Add multi-cuda wheel build
This PR introduces "multi-CUDA" wheels builds, exactly similar to the ones we build for CCCL, as described here.
Instead of shipping multiple packages (pynvbench-cu12, pynvbench-cu13); we ship a single one (pynvbench) that contains both CUDA 12 and 13 binaries (bindings). The resulting wheel size is just ~2.2MB, so this is not a problem. At runtime, depending on the available CUDA version, the appropriate binary is loaded.
At install time, we still need the user to specify which CUDA major version to install for - because the version of required dependencies (nvidia-cuda-cupti-cu12) depend on the CUDA major version. This specification of the CUDA version is done using "extras":
pip install pynvbench[cu12] # or cu13
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
/ok to test 1ab0a2e93316753271b1b5a79d719e87cde35e2f
/ok to test 4b5fb32dea460fbaad7eae6f21ad2d6f5e2b0848
/ok to test a0206000f4e540f5ac712e37bf95cd0bb86e8104
/ok to test 18ecf118643e1ee84052eb9800e68a9c5cfa9d8e
/ok to test b62956b5004f0221f38c9f8ff903e37e11c12c7d
/ok to test edd31c764d32c3608c0a1bc1339fd335181017b8
/ok to test d178a089324e0aac866ecdf7a579bb8626f4d290
/ok to test fd2de8c9b0ddf5a6babfc79ca73b1ffcfd47a60d
/ok to test 1e2ef0fe04fb89be49363c9313cfeaf2aa2c23bc
/ok to test d54f264f23f350f805d3eed5eefce96e4ec7be6c
/ok to test 13cb606165d7beb364f243c0d41c72f79d534c8e
/ok to test 9af979d3f305ceb2f616e29f401ea1dd982d412a
/ok to test ec3615b1a5f17f1220b1f03311b4aaf92e779c5e