cuda-python icon indicating copy to clipboard operation
cuda-python copied to clipboard

[ENH]: Clean up `SUPPORTED_...` variables between load libs and find header dirs

Open rwgk opened this issue 4 months ago • 2 comments

Make the variable names more uniform, decide what should be public, document all public variables.

Relevant files:

  • cuda_pathfinder/cuda/pathfinder/_dynamic_libs/supported_nvidia_libs.py
  • cuda_pathfinder/cuda/pathfinder/_headers/supported_nvidia_headers.py

See also: comments under PR #1194

rwgk avatar Oct 29 '25 00:10 rwgk

To document what I had in mind for the public SUPPORTED_... variables:

The original reason for exposing SUPPORTED_NVIDIA_LIBNAMES and SUPPORTED_HEADERS_CTK was to give users a way to discover what's available programmatically, so they're not forced to rely on tryexcept handling.

Regarding the CTK vs non-CTK distinction: when I first implemented load_nvidia_dynamic_lib(), I hadn’t yet considered this distinction, but as I learned more, I came to believe it’s a useful one — the expectations for the two categories differ significantly.

My preferred design for the public API would be:

  • SUPPORTED_NVIDIA_LIBNAMES_CTK, SUPPORTED_NVIDIA_LIBNAMES_NON_CTK

  • SUPPORTED_HEADERS_CTK, SUPPORTED_HEADERS_NON_CTK

We could keep SUPPORTED_NVIDIA_LIBNAMES for backward compatibility but deprecate it. It’s easy for users to make their intent explicit, e.g. SUPPORTED_NVIDIA_LIBNAMES_CTK + SUPPORTED_NVIDIA_LIBNAMES_NON_CTK, or using the | operator for the SUPPORTED_HEADERS_... variants.

Note that the values of the SUPPORTED_... variables are generally platform-dependent, which is why providing a public mechanism to inspect them programmatically seems important.

I didn’t give much weight to docstring considerations, although I agree it would be useful to show the supported libnames. That list would need to represent a superset across all platforms.

rwgk avatar Oct 29 '25 20:10 rwgk

Capture what I shared with Ralf offline since it was not captured above.

I don't think the CTK/non-CTK distinction is super important. Our public API names, such as find_nvidia_header_directory, have "nvidia" instead of "cuda" in the names. So anything from NVIDIA could potentially fit.

For the documentation, the issue is it is unclear what libraries are supported and what not, for example NCCL is not shown in the doc (https://github.com/NVIDIA/cuda-python/pull/1194#discussion_r2471165220). I don't think the platform difference (ex: NCCL does not support Windows) is important enough for us to prefer a particular platform or bend backward and make this unnecessarily hard.

Let’s ensure the docs capture the support list by an f-string approach: We assemble the docstrings of find_nvidia_header_directory and load_nvidia_dynamic_lib at module import time, to populate a list like this in the docstrings (untested):

r"""
Supported libraries include:

""" + "\n".join(
    f"   - \"{lib}\"" for lib in SUPPORTED_NVIDIA_LIBNAMES_CTK + SUPPORTED_NVIDIA_LIBNAMES_NON_CTK
)

which gives something like

"""
Supported libraries include:

   - "cublas"
   - "cusolver"
   - ...
   - "cutensor"
   - "nvshmem"
   - "nccl"
   - ...

"""

and then we don’t have to think about exposing any of the SUPPORTED* module variables.

For those already exposed (SUPPORTED_NVIDIA_LIBNAMES, SUPPORTED_HEADERS_CTK) we can find a way to deprecate them later (it’s doable via module-level __getattr__).

leofang avatar Oct 30 '25 02:10 leofang