DCGM icon indicating copy to clipboard operation
DCGM copied to clipboard

NVSwitch power

Open mutinifni opened this issue 2 years ago • 2 comments

I'm attempting to measure NVSwitch power usage using DCGM on a DGX-A100 machine:

❯ dcgmi group -l
+-------------------+----------------------------------------------------------+
| GROUPS                                                                       |
| 2 groups found.                                                              |
+===================+==========================================================+
| Groups            |                                                          |
| -> 0              |                                                          |
|    -> Group ID    | 0                                                        |
|    -> Group Name  | DCGM_ALL_SUPPORTED_GPUS                                  |
|    -> Entities    | GPU 0, GPU 1, GPU 2, GPU 3, GPU 4, GPU 5, GPU 6, GPU 7   |
| -> 1              |                                                          |
|    -> Group ID    | 1                                                        |
|    -> Group Name  | DCGM_ALL_SUPPORTED_NVSWITCHES                            |
|    -> Entities    | Switch 12, Switch 10, Switch 9, Switch 11, Switch 8, Switch 13 |
+-------------------+----------------------------------------------------------+

❯ dcgmi dmon -g 1 -e 701,702,703,704
#Entity   SWVOLT                      SWCUR                       SCIDDQ                      SCDVDD
ID
Switch 13 N/A                         0                           3                           0
Switch 8  N/A                         0                           3                           0
Switch 11 N/A                         0                           3                           0
Switch 9  N/A                         0                           3                           0
Switch 10 N/A                         0                           3                           0

I have a couple of questions:

  1. The SWVOLT field always displays N/A, and the current fields never change. How could I get this to work?
  2. Which is the correct current value to use for the power calculation for NVSwitch (V * I)?

Thanks!

mutinifni avatar Oct 14 '23 23:10 mutinifni

SWVOLT is not supported on A100 unfortunately.

glowkey avatar Oct 16 '23 16:10 glowkey

Could you please let me know which GPUs it is supported on? Also, how would I obtain the power reading? (Q2)

mutinifni avatar Oct 16 '23 17:10 mutinifni