DCGM
DCGM copied to clipboard
NVSwitch power
I'm attempting to measure NVSwitch power usage using DCGM on a DGX-A100 machine:
❯ dcgmi group -l
+-------------------+----------------------------------------------------------+
| GROUPS |
| 2 groups found. |
+===================+==========================================================+
| Groups | |
| -> 0 | |
| -> Group ID | 0 |
| -> Group Name | DCGM_ALL_SUPPORTED_GPUS |
| -> Entities | GPU 0, GPU 1, GPU 2, GPU 3, GPU 4, GPU 5, GPU 6, GPU 7 |
| -> 1 | |
| -> Group ID | 1 |
| -> Group Name | DCGM_ALL_SUPPORTED_NVSWITCHES |
| -> Entities | Switch 12, Switch 10, Switch 9, Switch 11, Switch 8, Switch 13 |
+-------------------+----------------------------------------------------------+
❯ dcgmi dmon -g 1 -e 701,702,703,704
#Entity SWVOLT SWCUR SCIDDQ SCDVDD
ID
Switch 13 N/A 0 3 0
Switch 8 N/A 0 3 0
Switch 11 N/A 0 3 0
Switch 9 N/A 0 3 0
Switch 10 N/A 0 3 0
I have a couple of questions:
- The SWVOLT field always displays N/A, and the current fields never change. How could I get this to work?
- Which is the correct current value to use for the power calculation for NVSwitch (V * I)?
Thanks!
SWVOLT is not supported on A100 unfortunately.
Could you please let me know which GPUs it is supported on? Also, how would I obtain the power reading? (Q2)