cuda_memtest
cuda_memtest copied to clipboard
Handle GPUs that lack full NVML Support
Nvidia NVML does not support non-Tesla product very well. Problems are known with mobile cards and even Quadro cards. (Reported as RFE to Nvidia as Bug ID 2417658.)
Anyway, this can lead to cuda_memtest
throwing an [NVML] Error: Not supported
(in nvmlDeviceGetSerial
) exception which we should catch.
Testing on a GTX 950M, I get this while running PIConGPU
:
</home/berceanu/src/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.3.0/picongpu-0.4.0-lqbxwsudtgms2do4ksm57uovvv4ypx4e/thirdParty/cuda_memtest/misc.cpp>:35
It seems to be just a warning, as the simulation completes after that.
See that disabling the memtest fixes it:
pic-build -b "cuda:50" -c "-DCUDAMEMTEST_ENABLE=OFF"
Should we add a known issue in the docs for non-tesla cards?
Thx for the report! Can you please post the warning? Is there a line missing?
Nope, there is only that single line.
Ah ok, but it does not abort, yes!
Ok, we have to clean up that macro, it should not randomly start to write to cerr
:
https://github.com/ComputationalRadiationPhysics/cuda_memtest/blob/7a585d504831431d0e95ff00d0217181201dbb12/cuda_memtest.h#L146-L150
I proposed a fix in #18 that should remove that noisy line from your output. It can (rightfully) be ignored.