cuda-python [FEA]: Expose NVML events as async functions

Is this a duplicate?

[x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct

Area

cuda.core

Is your feature request related to a problem? Please describe.

As of #1448, cuda.core.system exposes NVML events using a wait function with a timeout. The natural way to expose this to Python would be an async function, however, I'm wary of adding this functionality without a concrete use case and good testing. Filing this bug as a placeholder in case that arises.

Describe the solution you'd like

NA

Describe alternatives you've considered

NA

Additional context

NA

Jan 12 '26 16:01 mdboom

It'd be nice to know what the latency requirement is for NVML event queries. If it needs to be very low (by how much?), I've heard in the past from @pentschev based on his experience in UCX-py/UCXX that Python async event loops may not be performant enough.

Jan 15 '26 00:01 leofang

It'd be nice to know what the latency requirement is for NVML event queries. If it needs to be very low (by how much?), I've heard in the past from @pentschev based on his experience in UCX-py/UCXX that Python async event loops may not be performant enough.

That's correct, I don't mind having an async interface, but I think a non-async interface needs to be provided regardless. I have previously analyzed overhead, and async can be prohibitively expensive (progressively slower from a plain coroutine -> coroutine with future -> task), even on the most basic case of a plain coroutine it's 3x the cost of a regular blocking Python function. So while async would be a nice convenience, providing only async versions is probably insufficient for performance-critical areas.

Jan 15 '26 10:01 pentschev