sm_stress test is missing from dcgm-3.2.3
The sm_stress test seems to be missing from the latest release. When trying to run it explicitly, DCGM complains:
$ dcgmi diag -i 0,1,2,3 -v -r sm_stress --fail-early -p "sm_stress.target_stress=17000"
Invalid Parameter String: test 'sm_stress' does not match any loaded tests. Check logs for plugin failures.
The corresponding shared objects are no longer part of the RPM:
/usr/share/nvidia-validation-suite/plugins/cuda12/libSmStress.so
/usr/share/nvidia-validation-suite/plugins/cuda12/libSmStress.so.3
/usr/share/nvidia-validation-suite/plugins/cuda12/libSmStress.so.3.1.8
The release notes for version 3.1.3 mention that sm_stress is no longer run as part of diagnostic levels 3 or 4, but do not mention the test being removed in 3.2.3.
The sources for version 3.2.3 have not been exported to GitHub yet.
(As a side note, the documentation for DCGM Diagnostics contradict the release notes, since they list sm_stress as still being part of diagnostics levels 3 and 4.)
The "sm_stress" test was deprecated in 3.1.3 because its functionality is superseded by the "diagnostic" test. It was removed in 3.2.3. The "diagnostic" test is the recommended replacement.