[fix] export failure with CUDA driver < 526 and pynvml>=11.5.0
- There is a bug that was fixed in the 526 driver release. For older driver versions the recommendation is to downgrade the pynvml version to 11.4.0 and use 11.5.0 only for drivers after 526.
Uses the legacy pynvml memory usage function even with pynvml 11.5.0 if the driver version is older than 526.
Mentioned in the issue as well: NVIDIA/TensorRT-LLM#808 (comment)
Thanks for addressing the pynvml issue, relating to a driver version. @CoderHam can I know which doc(or link) you referred to determine the driver version (526)?
@jaedeok-nvidia took a while to dig through it but I followed the thread from https://forums.developer.nvidia.com/t/nvml-bug-nvmldevicegetcomputerunningprocesses-returns-compute-processes-for-all-gpu-devices/222337/2 and https://github.com/NVIDIA/k8s-device-plugin/issues/331#issuecomment-1498616763
This confirmed that the issue with missing symbols in the underlying nvml libraries prevents us from using the v2 api prior to driver 526.
Hi @CoderHam , the changes are integrated in https://github.com/NVIDIA/TensorRT-LLM/pull/1688 and we've credited you as co-author, hence I'm closing this PR now, thanks a lot