benchmarks Don't see any transfers on NVLINK with NCCL all

Don't see any transfers on NVLINK with NCCL all_sum on p3.8xlarge

Open aurotripathy opened this issue 7 years ago • 0 comments

With the following code, nvidia-smi nvlink -g 0 -i 0 report zero bytes transmitted/received.
Same, if I kick off the benchmarks with --all_reduce_spec=nccl --variable_update=replicated

from tensorflow.contrib.nccl import all_sum 

with tf.device('/gpu:0'):
        a = tf.get_variable(
            "a", initializer=tf.constant(1.0, shape=(args.dim, args.dim)))
    
with tf.device('/gpu:1'):
        b = tf.get_variable(
            "b", initializer=tf.constant(2.0, shape=(args.dim, args.dim)))
    
with tf.device('/gpu:0'):
        summed_node = all_sum([a, b])
         sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,
                                            log_device_placement=True))
    
init = tf.global_variables_initializer() 
sess.run(init)

with tf.device('/gpu:0'):
        summed = sess.run(summed_node)

My machine is an AWS instance of p3.8xlarge. My understanding is, this configuration supports NVLINK.

The execution is fine but when I use nvidia-smi nvlink -g 0 -i 0 the link Tx/Rx counts are zero.

Here's some relevant config info (topology and link status)

(tensorflow_p36) ubuntu@ip-172-31-22-42:~$ nvidia-smi topo --matrix
        GPU0    GPU1    GPU2    GPU3    CPU Affinity
GPU0     X      NV1     NV1     NV2     0-31
GPU1    NV1      X      NV2     NV1     0-31
GPU2    NV1     NV2      X      NV2     0-31
GPU3    NV2     NV1     NV2      X      0-31

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe switches (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing a single PCIe switch
  NV#  = Connection traversing a bonded set of # NVLinks

(tensorflow_p36) ubuntu@ip-172-31-22-42:~$ nvidia-smi nvlink --status -i 0
GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-1a2670a5-1fdc-24df-2a79-ec6645f0d511)
         Link 0: 25.781 GB/s
         Link 1: 25.781 GB/s
         Link 2: 25.781 GB/s
         Link 3: 25.781 GB/s

Nov 29 '18 20:11 aurotripathy

benchmarks benchmarks copied to clipboard

Don't see any transfers on NVLINK with NCCL all_sum on p3.8xlarge

benchmarks
benchmarks copied to clipboard