[FEA]: Expose NVML events as async functions
Is this a duplicate?
- [x] I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
cuda.core
Is your feature request related to a problem? Please describe.
As of #1448, cuda.core.system exposes NVML events using a wait function with a timeout. The natural way to expose this to Python would be an async function, however, I'm wary of adding this functionality without a concrete use case and good testing. Filing this bug as a placeholder in case that arises.
Describe the solution you'd like
NA
Describe alternatives you've considered
NA
Additional context
NA
It'd be nice to know what the latency requirement is for NVML event queries. If it needs to be very low (by how much?), I've heard in the past from @pentschev based on his experience in UCX-py/UCXX that Python async event loops may not be performant enough.
It'd be nice to know what the latency requirement is for NVML event queries. If it needs to be very low (by how much?), I've heard in the past from @pentschev based on his experience in UCX-py/UCXX that Python async event loops may not be performant enough.
That's correct, I don't mind having an async interface, but I think a non-async interface needs to be provided regardless. I have previously analyzed overhead, and async can be prohibitively expensive (progressively slower from a plain coroutine -> coroutine with future -> task), even on the most basic case of a plain coroutine it's 3x the cost of a regular blocking Python function. So while async would be a nice convenience, providing only async versions is probably insufficient for performance-critical areas.