DeepEP
DeepEP copied to clipboard
error when testing test_internode.sh deep_ep.cpp:83 'an illegal memory access was encountered'
[testing] Running with BF16, without top-k (async=False, previous=False) ... passed [testing] Running with BF16, with top-k (async=False, previous=False) ... passed [testing] Running with BF16, without top-k (async=False, previous=False) ... passed [testing] Running with BF16, with top-k (async=False, previous=False) ... passed [testing] Running with FP8, without top-k (async=False, previous=False) ... passed [testing] Running with FP8, with top-k (async=False, previous=False) ... passed [testing] Running with BF16, without top-k (async=True, previous=False) ... passed [testing] Running with BF16, with top-k (async=True, previous=False) ... passed [testing] Running with BF16, without top-k (async=True, previous=False) ... passed [testing] Running with BF16, with top-k (async=True, previous=False) ... passed [testing] Running with FP8, without top-k (async=True, previous=False) ... passed [testing] Running with FP8, with top-k (async=True, previous=False) ... passed [testing] Running with BF16, without top-k (async=False, previous=True) ... passed [testing] Running with BF16, with top-k (async=False, previous=True) ... passed [testing] Running with BF16, without top-k (async=False, previous=True) ... passed [testing] Running with BF16, with top-k (async=False, previous=True) ... passed [testing] Running with FP8, without top-k (async=False, previous=True) ... passed [testing] Running with FP8, with top-k (async=False, previous=True) ... passed [testing] Running with BF16, without top-k (async=True, previous=True) ... passed [testing] Running with BF16, with top-k (async=True, previous=True) ... passed [testing] Running with BF16, without top-k (async=True, previous=True) ... passed [testing] Running with BF16, with top-k (async=True, previous=True) ... passed [testing] Running with FP8, without top-k (async=True, previous=True) ... passed [testing] Running with FP8, with top-k (async=True, previous=True) ... passed
[tuning] SMs 24, NVL chunk 4, RDMA chunk 4: 11.40 GB/s (RDMA), 37.23 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 8: 16.04 GB/s (RDMA), 52.36 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 12: 17.86 GB/s (RDMA), 58.30 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 16: 18.73 GB/s (RDMA), 61.14 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 20: 19.29 GB/s (RDMA), 62.96 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 24: 19.47 GB/s (RDMA), 63.55 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 28: 19.19 GB/s (RDMA), 62.62 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 32: 20.16 GB/s (RDMA), 65.79 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 4: 11.65 GB/s (RDMA), 38.04 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 8: 16.41 GB/s (RDMA), 53.58 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 12: 18.14 GB/s (RDMA), 59.22 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 16: 18.96 GB/s (RDMA), 61.89 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 20: 19.64 GB/s (RDMA), 64.11 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 24: 18.94 GB/s (RDMA), 61.83 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 28: 19.17 GB/s (RDMA), 62.59 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 32: 19.71 GB/s (RDMA), 64.32 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 4: 11.63 GB/s (RDMA), 37.97 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 8: 15.90 GB/s (RDMA), 51.89 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 12: 17.78 GB/s (RDMA), 58.02 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 16: 18.91 GB/s (RDMA), 61.73 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 20: 19.61 GB/s (RDMA), 64.00 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 24: 19.66 GB/s (RDMA), 64.16 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 28: 18.89 GB/s (RDMA), 61.67 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 32: 19.64 GB/s (RDMA), 64.10 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 4: 11.72 GB/s (RDMA), 38.24 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 8: 16.32 GB/s (RDMA), 53.28 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 12: 18.23 GB/s (RDMA), 59.52 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 16: 19.15 GB/s (RDMA), 62.51 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 20: 19.44 GB/s (RDMA), 63.45 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 24: 19.82 GB/s (RDMA), 64.68 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 28: 19.21 GB/s (RDMA), 62.71 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 32: 19.14 GB/s (RDMA), 62.48 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 4: 11.76 GB/s (RDMA), 38.39 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 8: 16.45 GB/s (RDMA), 53.70 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 12: 18.26 GB/s (RDMA), 59.61 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 16: 19.11 GB/s (RDMA), 62.36 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 20: 19.21 GB/s (RDMA), 62.69 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 24: 19.73 GB/s (RDMA), 64.41 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 28: 18.47 GB/s (RDMA), 60.30 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 32: 19.43 GB/s (RDMA), 63.41 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 4: 11.83 GB/s (RDMA), 38.63 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 8: 15.69 GB/s (RDMA), 51.21 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 12: 17.93 GB/s (RDMA), 58.52 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 16: 18.72 GB/s (RDMA), 61.09 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 20: 19.59 GB/s (RDMA), 63.95 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 24: 19.50 GB/s (RDMA), 63.65 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 28: 19.92 GB/s (RDMA), 65.03 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 32: 19.66 GB/s (RDMA), 64.16 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 4: 11.91 GB/s (RDMA), 38.88 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 8: 16.13 GB/s (RDMA), 52.64 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 12: 17.64 GB/s (RDMA), 57.56 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 16: 18.50 GB/s (RDMA), 60.37 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 20: 19.63 GB/s (RDMA), 64.07 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 24: 19.70 GB/s (RDMA), 64.30 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 28: 19.67 GB/s (RDMA), 64.20 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 32: 9.39 GB/s (RDMA), 30.64 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 4: 11.90 GB/s (RDMA), 38.85 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 8: 16.47 GB/s (RDMA), 53.77 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 12: 18.30 GB/s (RDMA), 59.73 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 16: 19.09 GB/s (RDMA), 62.32 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 20: 19.43 GB/s (RDMA), 63.43 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 24: 19.53 GB/s (RDMA), 63.76 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 28: 19.63 GB/s (RDMA), 64.09 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 32: 20.02 GB/s (RDMA), 65.35 GB/s (NVL) [tuning] Best dispatch (FP8): SMs 24, NVL chunk 4, RDMA chunk 32: 20.16 GB/s (RDMA), 65.79 GB/s (NVL)
[tuning] SMs 24, NVL chunk 4, RDMA chunk 4: 16.26 GB/s (RDMA), 53.09 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 8: 18.76 GB/s (RDMA), 61.22 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 12: 20.10 GB/s (RDMA), 65.59 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 16: 10.41 GB/s (RDMA), 33.97 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 20: 4.99 GB/s (RDMA), 16.27 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 24: 3.56 GB/s (RDMA), 11.62 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 28: 14.20 GB/s (RDMA), 46.35 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 32: 19.41 GB/s (RDMA), 63.35 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 4: 16.46 GB/s (RDMA), 53.73 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 8: 19.04 GB/s (RDMA), 62.14 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 12: 19.85 GB/s (RDMA), 64.80 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 16: 20.05 GB/s (RDMA), 65.43 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 20: 5.41 GB/s (RDMA), 17.66 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 24: 10.07 GB/s (RDMA), 32.87 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 28: 12.51 GB/s (RDMA), 40.83 GB/s (NVL) [tuning] SMs 24, NVL chunk 8, RDMA chunk 32: 20.87 GB/s (RDMA), 68.13 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 4: 16.24 GB/s (RDMA), 53.02 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 8: 18.74 GB/s (RDMA), 61.17 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 12: 20.23 GB/s (RDMA), 66.03 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 16: 18.48 GB/s (RDMA), 60.32 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 20: 9.31 GB/s (RDMA), 30.37 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 24: 6.25 GB/s (RDMA), 20.40 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 28: 13.34 GB/s (RDMA), 43.53 GB/s (NVL) [tuning] SMs 24, NVL chunk 12, RDMA chunk 32: 18.52 GB/s (RDMA), 60.44 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 4: 16.28 GB/s (RDMA), 53.13 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 8: 18.92 GB/s (RDMA), 61.76 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 12: 20.25 GB/s (RDMA), 66.09 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 16: 20.60 GB/s (RDMA), 67.23 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 20: 6.19 GB/s (RDMA), 20.20 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 24: 18.09 GB/s (RDMA), 59.04 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 28: 19.41 GB/s (RDMA), 63.34 GB/s (NVL) [tuning] SMs 24, NVL chunk 16, RDMA chunk 32: 21.26 GB/s (RDMA), 69.39 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 4: 16.36 GB/s (RDMA), 53.39 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 8: 18.96 GB/s (RDMA), 61.88 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 12: 20.23 GB/s (RDMA), 66.02 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 16: 20.11 GB/s (RDMA), 65.64 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 20: 14.14 GB/s (RDMA), 46.16 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 24: 14.96 GB/s (RDMA), 48.82 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 28: 14.66 GB/s (RDMA), 47.84 GB/s (NVL) [tuning] SMs 24, NVL chunk 20, RDMA chunk 32: 21.13 GB/s (RDMA), 68.96 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 4: 16.35 GB/s (RDMA), 53.36 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 8: 19.00 GB/s (RDMA), 62.02 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 12: 19.86 GB/s (RDMA), 64.83 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 16: 20.64 GB/s (RDMA), 67.37 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 20: 19.63 GB/s (RDMA), 64.08 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 24: 12.58 GB/s (RDMA), 41.06 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 28: 13.21 GB/s (RDMA), 43.10 GB/s (NVL) [tuning] SMs 24, NVL chunk 24, RDMA chunk 32: 21.18 GB/s (RDMA), 69.13 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 4: 16.27 GB/s (RDMA), 53.10 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 8: 18.93 GB/s (RDMA), 61.80 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 12: 20.18 GB/s (RDMA), 65.85 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 16: 20.59 GB/s (RDMA), 67.22 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 20: 19.96 GB/s (RDMA), 65.14 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 24: 12.15 GB/s (RDMA), 39.67 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 28: 19.53 GB/s (RDMA), 63.74 GB/s (NVL) [tuning] SMs 24, NVL chunk 28, RDMA chunk 32: 20.87 GB/s (RDMA), 68.11 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 4: 16.19 GB/s (RDMA), 52.85 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 8: 19.06 GB/s (RDMA), 62.20 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 12: 20.17 GB/s (RDMA), 65.83 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 16: 20.12 GB/s (RDMA), 65.69 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 20: 19.95 GB/s (RDMA), 65.12 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 24: 17.83 GB/s (RDMA), 58.19 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 28: 21.06 GB/s (RDMA), 68.72 GB/s (NVL) [tuning] SMs 24, NVL chunk 32, RDMA chunk 32: 21.06 GB/s (RDMA), 68.72 GB/s (NVL) [tuning] Best dispatch (BF16): SMs 24, NVL chunk 16, RDMA chunk 32: 21.26 GB/s (RDMA), 69.39 GB/s (NVL)
[tuning] SMs 24, NVL chunk 1, RDMA chunk 8: 18.21 GB/s (RDMA), 59.44 GB/s (NVL) [tuning] SMs 24, NVL chunk 1, RDMA chunk 12: 19.62 GB/s (RDMA), 64.02 GB/s (NVL) [tuning] SMs 24, NVL chunk 1, RDMA chunk 16: 19.91 GB/s (RDMA), 65.00 GB/s (NVL) [tuning] SMs 24, NVL chunk 1, RDMA chunk 20: 20.07 GB/s (RDMA), 65.49 GB/s (NVL) [tuning] SMs 24, NVL chunk 1, RDMA chunk 24: 18.60 GB/s (RDMA), 60.70 GB/s (NVL) [tuning] SMs 24, NVL chunk 1, RDMA chunk 28: 20.61 GB/s (RDMA), 67.26 GB/s (NVL) [tuning] SMs 24, NVL chunk 1, RDMA chunk 32: 20.70 GB/s (RDMA), 67.57 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 8: 18.00 GB/s (RDMA), 58.76 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 12: 19.60 GB/s (RDMA), 63.97 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 16: 20.25 GB/s (RDMA), 66.10 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 20: 18.59 GB/s (RDMA), 60.69 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 24: 20.63 GB/s (RDMA), 67.34 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 28: 17.65 GB/s (RDMA), 57.61 GB/s (NVL) [tuning] SMs 24, NVL chunk 2, RDMA chunk 32: 20.81 GB/s (RDMA), 67.93 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 8: 17.99 GB/s (RDMA), 58.73 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 12: 19.49 GB/s (RDMA), 63.63 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 16: 20.35 GB/s (RDMA), 66.43 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 20: 20.11 GB/s (RDMA), 65.65 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 24: 19.31 GB/s (RDMA), 63.01 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 28: 20.75 GB/s (RDMA), 67.72 GB/s (NVL) [tuning] SMs 24, NVL chunk 3, RDMA chunk 32: 20.91 GB/s (RDMA), 68.25 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 8: 17.93 GB/s (RDMA), 58.53 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 12: 19.50 GB/s (RDMA), 63.65 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 16: 20.28 GB/s (RDMA), 66.20 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 20: 20.51 GB/s (RDMA), 66.93 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 24: 20.26 GB/s (RDMA), 66.12 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 28: 20.63 GB/s (RDMA), 67.35 GB/s (NVL) [tuning] SMs 24, NVL chunk 4, RDMA chunk 32: 20.30 GB/s (RDMA), 66.26 GB/s (NVL) [tuning] Best combine: SMs 24, NVL chunk 3, RDMA chunk 32: 20.91 GB/s (RDMA), 68.25 GB/s (NVL)
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
terminate called after throwing an instance of 'EPException'
what(): Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/deep_ep.cpp:83 'an illegal memory access was encountered'
W0514 11:01:33.640000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 171 via signal SIGTERM
W0514 11:01:33.641000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 172 via signal SIGTERM
W0514 11:01:33.641000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 173 via signal SIGTERM
W0514 11:01:33.641000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 174 via signal SIGTERM
W0514 11:01:33.641000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 175 via signal SIGTERM
W0514 11:01:33.641000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 176 via signal SIGTERM
W0514 11:01:33.641000 140559862404928 torch/multiprocessing/spawn.py:146] Terminating process 178 via signal SIGTERM
Traceback (most recent call last):
File "/sharedata/msm/workspace/DeepEP/tests/test_internode.py", line 247, in
-- Process 6 terminated with the following error: Traceback (most recent call last): File "/home/aigc/miniforge3/envs/mamba/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 76, in _wrap fn(i, *args) File "/sharedata/msm/workspace/DeepEP/tests/test_internode.py", line 242, in test_loop test_low_latency.test_main(ll_num_tokens, ll_hidden, ll_num_experts, ll_num_topk, rank, num_ranks, group, buffer, seed=1) File "/sharedata/msm/workspace/DeepEP/tests/test_low_latency.py", line 40, in test_main buffer.low_latency_dispatch(x, topk_idx, num_tokens, num_experts, use_fp8=dispatch_use_fp8, File "/home/aigc/miniforge3/envs/mamba/lib/python3.10/site-packages/deep_ep-1.0.0+bb393e7-py3.10-linux-x86_64.egg/deep_ep/buffer.py", line 487, in low_latency_dispatch self.runtime.low_latency_dispatch(x, topk_idx, RuntimeError: Failed: CUDA error /sharedata/msm/workspace/DeepEP/csrc/kernels/internode_ll.cu:341 'an illegal memory access was encountered'
shortk8snode10:171:171 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:171:171 [0] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:171:171 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:171:171 [0] NCCL INFO cudaDriverVersion 12040 NCCL version 2.20.5+cuda12.4 shortk8snode10:175:175 [4] NCCL INFO cudaDriverVersion 12040 shortk8snode10:175:175 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:175:175 [4] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:175:175 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:173:173 [2] NCCL INFO cudaDriverVersion 12040 shortk8snode10:173:173 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:173:173 [2] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:173:173 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:174:174 [3] NCCL INFO cudaDriverVersion 12040 shortk8snode10:174:174 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:178:178 [7] NCCL INFO cudaDriverVersion 12040 shortk8snode10:178:178 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:174:174 [3] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:178:178 [7] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:174:174 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:178:178 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:172:172 [1] NCCL INFO cudaDriverVersion 12040 shortk8snode10:172:172 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:172:172 [1] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:172:172 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:176:176 [5] NCCL INFO cudaDriverVersion 12040 shortk8snode10:176:176 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:177:177 [6] NCCL INFO cudaDriverVersion 12040 shortk8snode10:177:177 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:176:176 [5] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:177:177 [6] NCCL INFO Bootstrap : Using eth0:192.168.128.8<0> shortk8snode10:176:176 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:177:177 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation shortk8snode10:171:733 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:171:733 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:171:733 [0] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:175:734 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:175:734 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:175:734 [4] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:173:735 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:173:735 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:174:736 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:174:736 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:178:737 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:178:737 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:173:735 [2] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:174:736 [3] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:178:737 [7] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:172:738 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:172:738 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:171:733 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:171:733 [0] NCCL INFO Using non-device net plugin version 0 shortk8snode10:171:733 [0] NCCL INFO Using network IB shortk8snode10:176:739 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:177:740 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 0. shortk8snode10:176:739 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:177:740 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 shortk8snode10:175:734 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:175:734 [4] NCCL INFO Using non-device net plugin version 0 shortk8snode10:175:734 [4] NCCL INFO Using network IB shortk8snode10:173:735 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:173:735 [2] NCCL INFO Using non-device net plugin version 0 shortk8snode10:173:735 [2] NCCL INFO Using network IB shortk8snode10:178:737 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:174:736 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:178:737 [7] NCCL INFO Using non-device net plugin version 0 shortk8snode10:178:737 [7] NCCL INFO Using network IB shortk8snode10:174:736 [3] NCCL INFO Using non-device net plugin version 0 shortk8snode10:174:736 [3] NCCL INFO Using network IB shortk8snode10:172:738 [1] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:177:740 [6] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:176:739 [5] NCCL INFO NCCL_IB_HCA set to mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_6:1,mlx5_7:1 shortk8snode10:172:738 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:172:738 [1] NCCL INFO Using non-device net plugin version 0 shortk8snode10:172:738 [1] NCCL INFO Using network IB shortk8snode10:177:740 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:176:739 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [4]mlx5_6:1/RoCE [5]mlx5_7:1/RoCE [RO]; OOB eth0:192.168.128.8<0> shortk8snode10:177:740 [6] NCCL INFO Using non-device net plugin version 0 shortk8snode10:177:740 [6] NCCL INFO Using network IB shortk8snode10:176:739 [5] NCCL INFO Using non-device net plugin version 0 shortk8snode10:176:739 [5] NCCL INFO Using network IB shortk8snode10:171:733 [0] NCCL INFO comm 0x5603a5cca990 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 61000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:174:736 [3] NCCL INFO comm 0x55ca7b3c5270 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId 67000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:173:735 [2] NCCL INFO comm 0x55db3e681eb0 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 65000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:172:738 [1] NCCL INFO comm 0x5575b9b0b3f0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 63000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:175:734 [4] NCCL INFO comm 0x55a12f01d300 rank 4 nranks 16 cudaDev 4 nvmlDev 4 busId a1000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:176:739 [5] NCCL INFO comm 0x55d35dd52f80 rank 5 nranks 16 cudaDev 5 nvmlDev 5 busId a3000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:173:735 [2] NCCL INFO MNNVL busId 0x65000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:177:740 [6] NCCL INFO comm 0x562b1dd2be40 rank 6 nranks 16 cudaDev 6 nvmlDev 6 busId a5000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:172:738 [1] NCCL INFO MNNVL busId 0x63000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:175:734 [4] NCCL INFO MNNVL busId 0xa1000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:171:733 [0] NCCL INFO MNNVL busId 0x61000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:178:737 [7] NCCL INFO comm 0x55e8b4736a90 rank 7 nranks 16 cudaDev 7 nvmlDev 7 busId a7000 commId 0x1ae93ec1860acd74 - Init START shortk8snode10:176:739 [5] NCCL INFO MNNVL busId 0xa3000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:174:736 [3] NCCL INFO MNNVL busId 0x67000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:178:737 [7] NCCL INFO MNNVL busId 0xa7000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:177:740 [6] NCCL INFO MNNVL busId 0xa5000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0 shortk8snode10:171:733 [0] NCCL INFO Setting affinity for GPU 0 to 03ffffff,ffffffff,ffffffff shortk8snode10:171:733 [0] NCCL INFO NVLS multicast support is available on dev 0 shortk8snode10:173:735 [2] NCCL INFO Setting affinity for GPU 2 to 03ffffff,ffffffff,ffffffff shortk8snode10:173:735 [2] NCCL INFO NVLS multicast support is available on dev 2 shortk8snode10:174:736 [3] NCCL INFO Setting affinity for GPU 3 to 03ffffff,ffffffff,ffffffff shortk8snode10:172:738 [1] NCCL INFO Setting affinity for GPU 1 to 03ffffff,ffffffff,ffffffff shortk8snode10:172:738 [1] NCCL INFO NVLS multicast support is available on dev 1 shortk8snode10:174:736 [3] NCCL INFO NVLS multicast support is available on dev 3 shortk8snode10:178:737 [7] NCCL INFO Setting affinity for GPU 7 to 0fffff,ffffffff,ffffffff,fc000000,00000000,00000000 shortk8snode10:178:737 [7] NCCL INFO NVLS multicast support is available on dev 7 shortk8snode10:176:739 [5] NCCL INFO Setting affinity for GPU 5 to 0fffff,ffffffff,ffffffff,fc000000,00000000,00000000 shortk8snode10:177:740 [6] NCCL INFO Setting affinity for GPU 6 to 0fffff,ffffffff,ffffffff,fc000000,00000000,00000000 shortk8snode10:176:739 [5] NCCL INFO NVLS multicast support is available on dev 5 shortk8snode10:177:740 [6] NCCL INFO NVLS multicast support is available on dev 6 shortk8snode10:175:734 [4] NCCL INFO Setting affinity for GPU 4 to 0fffff,ffffffff,ffffffff,fc000000,00000000,00000000 shortk8snode10:175:734 [4] NCCL INFO NVLS multicast support is available on dev 4 shortk8snode10:178:737 [7] NCCL INFO comm 0x55e8b4736a90 rank 7 nRanks 16 nNodes 2 localRanks 8 localRank 7 MNNVL 0 shortk8snode10:176:739 [5] NCCL INFO comm 0x55d35dd52f80 rank 5 nRanks 16 nNodes 2 localRanks 8 localRank 5 MNNVL 0 shortk8snode10:174:736 [3] NCCL INFO comm 0x55ca7b3c5270 rank 3 nRanks 16 nNodes 2 localRanks 8 localRank 3 MNNVL 0 shortk8snode10:178:737 [7] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:176:739 [5] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:177:740 [6] NCCL INFO comm 0x562b1dd2be40 rank 6 nRanks 16 nNodes 2 localRanks 8 localRank 6 MNNVL 0 shortk8snode10:178:737 [7] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:176:739 [5] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:178:737 [7] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:172:738 [1] NCCL INFO comm 0x5575b9b0b3f0 rank 1 nRanks 16 nNodes 2 localRanks 8 localRank 1 MNNVL 0 shortk8snode10:176:739 [5] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:173:735 [2] NCCL INFO comm 0x55db3e681eb0 rank 2 nRanks 16 nNodes 2 localRanks 8 localRank 2 MNNVL 0 shortk8snode10:178:737 [7] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:176:739 [5] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:178:737 [7] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:175:734 [4] NCCL INFO comm 0x55a12f01d300 rank 4 nRanks 16 nNodes 2 localRanks 8 localRank 4 MNNVL 0 shortk8snode10:177:740 [6] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:174:736 [3] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:178:737 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] 0/-1/-1->7->6 [2] 0/-1/-1->7->6 [3] 0/-1/-1->7->6 [4] 0/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] 0/-1/-1->7->6 [7] 0/-1/-1->7->6 [8] 0/-1/-1->7->6 [9] 0/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] 0/-1/-1->7->6 [12] 0/-1/-1->7->6 [13] 0/-1/-1->7->6 [14] 0/-1/-1->7->6 [15] -1/-1/-1->7->6 shortk8snode10:176:739 [5] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:172:738 [1] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:177:740 [6] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:171:733 [0] NCCL INFO comm 0x5603a5cca990 rank 0 nRanks 16 nNodes 2 localRanks 8 localRank 0 MNNVL 0 shortk8snode10:174:736 [3] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:178:737 [7] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:173:735 [2] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:172:738 [1] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:177:740 [6] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:176:739 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/13/-1->5->-1 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->13 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/13/-1->5->-1 [15] 6/-1/-1->5->4 shortk8snode10:175:734 [4] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:174:736 [3] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:173:735 [2] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:172:738 [1] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:177:740 [6] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:176:739 [5] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:171:733 [0] NCCL INFO NVLS Head 0: 0 8 shortk8snode10:175:734 [4] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:174:736 [3] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:173:735 [2] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:172:738 [1] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:177:740 [6] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:171:733 [0] NCCL INFO NVLS Head 1: 1 9 shortk8snode10:175:734 [4] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:174:736 [3] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:173:735 [2] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:172:738 [1] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:171:733 [0] NCCL INFO NVLS Head 2: 2 10 shortk8snode10:175:734 [4] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:177:740 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 shortk8snode10:173:735 [2] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:171:733 [0] NCCL INFO NVLS Head 3: 4 12 shortk8snode10:175:734 [4] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:174:736 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] -1/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] -1/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] -1/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 shortk8snode10:172:738 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/9/-1->1->-1 [2] -1/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->9 [7] -1/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/9/-1->1->-1 [12] -1/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 shortk8snode10:177:740 [6] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:171:733 [0] NCCL INFO NVLS Head 4: 5 13 shortk8snode10:173:735 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/10/-1->2->-1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->10 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/10/-1->2->-1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 shortk8snode10:174:736 [3] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:175:734 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/12/-1->4->-1 [4] -1/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->12 [9] -1/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/12/-1->4->-1 [14] -1/-1/-1->4->3 [15] 5/-1/-1->4->3 shortk8snode10:172:738 [1] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:171:733 [0] NCCL INFO Channel 00/16 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 shortk8snode10:173:735 [2] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:175:734 [4] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:171:733 [0] NCCL INFO Channel 01/16 : 0 7 6 5 4 3 2 9 8 15 14 13 12 11 10 1 shortk8snode10:171:733 [0] NCCL INFO Channel 02/16 : 0 7 6 5 4 3 10 9 8 15 14 13 12 11 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 03/16 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 04/16 : 0 7 6 13 12 11 10 9 8 15 14 5 4 3 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 05/16 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 shortk8snode10:171:733 [0] NCCL INFO Channel 06/16 : 0 7 6 5 4 3 2 9 8 15 14 13 12 11 10 1 shortk8snode10:171:733 [0] NCCL INFO Channel 07/16 : 0 7 6 5 4 3 10 9 8 15 14 13 12 11 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 08/16 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 09/16 : 0 7 6 13 12 11 10 9 8 15 14 5 4 3 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 10/16 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 shortk8snode10:171:733 [0] NCCL INFO Channel 11/16 : 0 7 6 5 4 3 2 9 8 15 14 13 12 11 10 1 shortk8snode10:171:733 [0] NCCL INFO Channel 12/16 : 0 7 6 5 4 3 10 9 8 15 14 13 12 11 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 13/16 : 0 7 6 5 12 11 10 9 8 15 14 13 4 3 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 14/16 : 0 7 6 13 12 11 10 9 8 15 14 5 4 3 2 1 shortk8snode10:171:733 [0] NCCL INFO Channel 15/16 : 0 7 6 5 4 3 2 1 8 15 14 13 12 11 10 9 shortk8snode10:171:733 [0] NCCL INFO Trees [0] 1/8/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->8 [6] -1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/-1/-1->0->7 [9] 1/-1/-1->0->7 [10] 1/8/-1->0->-1 [11] -1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->8 shortk8snode10:171:733 [0] NCCL INFO P2P Chunksize set to 131072 shortk8snode10:176:739 [5] NCCL INFO Channel 04/0 : 14[6] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 09/0 : 14[6] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 14/0 : 14[6] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 03/0 : 5[5] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 03/0 : 13[5] -> 4[4] [receive] via NET/IB/4 shortk8snode10:176:739 [5] NCCL INFO Channel 08/0 : 5[5] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 08/0 : 13[5] -> 4[4] [receive] via NET/IB/4 shortk8snode10:176:739 [5] NCCL INFO Channel 13/0 : 5[5] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 13/0 : 13[5] -> 4[4] [receive] via NET/IB/4 shortk8snode10:177:740 [6] NCCL INFO Channel 04/0 : 6[6] -> 13[5] [send] via NET/IB/5 shortk8snode10:177:740 [6] NCCL INFO Channel 09/0 : 6[6] -> 13[5] [send] via NET/IB/5 shortk8snode10:177:740 [6] NCCL INFO Channel 14/0 : 6[6] -> 13[5] [send] via NET/IB/5 shortk8snode10:174:736 [3] NCCL INFO Channel 02/0 : 3[3] -> 10[2] [send] via NET/IB/3 shortk8snode10:174:736 [3] NCCL INFO Channel 07/0 : 3[3] -> 10[2] [send] via NET/IB/3 shortk8snode10:172:738 [1] NCCL INFO Channel 01/0 : 10[2] -> 1[1] [receive] via NET/IB/2 shortk8snode10:174:736 [3] NCCL INFO Channel 12/0 : 3[3] -> 10[2] [send] via NET/IB/3 shortk8snode10:172:738 [1] NCCL INFO Channel 06/0 : 10[2] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 11/0 : 10[2] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 00/0 : 1[1] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 00/0 : 9[1] -> 0[0] [receive] via NET/IB/1 shortk8snode10:172:738 [1] NCCL INFO Channel 05/0 : 1[1] -> 8[0] [send] via NET/IB/1 shortk8snode10:173:735 [2] NCCL INFO Channel 02/0 : 11[3] -> 2[2] [receive] via NET/IB/3 shortk8snode10:171:733 [0] NCCL INFO Channel 05/0 : 9[1] -> 0[0] [receive] via NET/IB/1 shortk8snode10:172:738 [1] NCCL INFO Channel 10/0 : 1[1] -> 8[0] [send] via NET/IB/1 shortk8snode10:173:735 [2] NCCL INFO Channel 07/0 : 11[3] -> 2[2] [receive] via NET/IB/3 shortk8snode10:172:738 [1] NCCL INFO Channel 15/0 : 1[1] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 10/0 : 9[1] -> 0[0] [receive] via NET/IB/1 shortk8snode10:173:735 [2] NCCL INFO Channel 12/0 : 11[3] -> 2[2] [receive] via NET/IB/3 shortk8snode10:171:733 [0] NCCL INFO Channel 15/0 : 9[1] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 01/0 : 2[2] -> 9[1] [send] via NET/IB/2 shortk8snode10:173:735 [2] NCCL INFO Channel 06/0 : 2[2] -> 9[1] [send] via NET/IB/2 shortk8snode10:173:735 [2] NCCL INFO Channel 11/0 : 2[2] -> 9[1] [send] via NET/IB/2 shortk8snode10:171:733 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:172:797 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:173:799 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:176:739 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:175:794 [4] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:174:796 [3] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:171:803 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:176:791 [5] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:177:793 [6] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3. shortk8snode10:171:733 [0] NCCL INFO Connected all rings shortk8snode10:172:738 [1] NCCL INFO Connected all rings shortk8snode10:171:733 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Connected all rings shortk8snode10:171:733 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 00/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:173:735 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM shortk8snode10:171:733 [0] NCCL INFO Channel 05/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 10/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 15/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 00/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 05/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:172:738 [1] NCCL INFO Channel 01/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:171:733 [0] NCCL INFO Channel 10/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:172:738 [1] NCCL INFO Channel 06/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:171:733 [0] NCCL INFO Channel 15/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:172:738 [1] NCCL INFO Channel 11/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 01/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 06/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 11/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:174:736 [3] NCCL INFO Connected all rings shortk8snode10:178:737 [7] NCCL INFO Connected all rings shortk8snode10:174:736 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 02/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 07/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 12/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 02/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 07/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 12/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:175:734 [4] NCCL INFO Connected all rings shortk8snode10:176:739 [5] NCCL INFO Connected all rings shortk8snode10:177:740 [6] NCCL INFO Connected all rings shortk8snode10:175:734 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:174:736 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 03/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:177:740 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 08/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 13/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:177:740 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Channel 03/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 08/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 13/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:177:740 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM shortk8snode10:177:740 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 04/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:178:737 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 09/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:177:740 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 14/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:178:737 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 04/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 09/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:177:740 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 14/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:178:737 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:178:737 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:173:735 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM shortk8snode10:172:738 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:176:739 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM shortk8snode10:175:734 [4] NCCL INFO Connected all trees shortk8snode10:177:740 [6] NCCL INFO Connected all trees shortk8snode10:176:739 [5] NCCL INFO Connected all trees shortk8snode10:177:740 [6] NCCL INFO NVLS comm 0x562b1dd2be40 headRank -1 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:176:739 [5] NCCL INFO NVLS comm 0x55d35dd52f80 headRank 4 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:175:734 [4] NCCL INFO NVLS comm 0x55a12f01d300 headRank 3 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:173:735 [2] NCCL INFO Connected all trees shortk8snode10:178:737 [7] NCCL INFO Connected all trees shortk8snode10:172:738 [1] NCCL INFO Connected all trees shortk8snode10:171:733 [0] NCCL INFO Connected all trees shortk8snode10:174:736 [3] NCCL INFO Connected all trees shortk8snode10:173:735 [2] NCCL INFO NVLS comm 0x55db3e681eb0 headRank 2 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:178:737 [7] NCCL INFO NVLS comm 0x55e8b4736a90 headRank -1 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:172:738 [1] NCCL INFO NVLS comm 0x5575b9b0b3f0 headRank 1 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:171:733 [0] NCCL INFO NVLS comm 0x5603a5cca990 headRank 0 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:174:736 [3] NCCL INFO NVLS comm 0x55ca7b3c5270 headRank -1 nHeads 5 buffSize 4194304 memSize 2097152 nvlsPerRankSize 201326592 nvlsTotalSize 1006632960 shortk8snode10:171:733 [0] NCCL INFO Channel 01/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 02/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 03/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 04/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 06/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 07/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 09/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 11/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 12/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 13/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 14/0 : 8[0] -> 0[0] [receive] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 02/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 03/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 04/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 06/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 07/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 11/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 12/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 13/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:171:733 [0] NCCL INFO Channel 14/0 : 0[0] -> 8[0] [send] via NET/IB/1 shortk8snode10:175:734 [4] NCCL INFO Channel 00/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 01/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 02/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:172:738 [1] NCCL INFO Channel 00/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 02/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 03/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 04/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 05/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 07/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 08/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 09/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 10/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 12/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 13/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 14/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 15/0 : 9[1] -> 1[1] [receive] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 00/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 02/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 03/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 04/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 05/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 07/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 08/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 09/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 10/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 12/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 13/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 14/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:172:738 [1] NCCL INFO Channel 15/0 : 1[1] -> 9[1] [send] via NET/IB/2 shortk8snode10:178:737 [7] NCCL INFO Connected NVLS tree shortk8snode10:178:737 [7] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:178:737 [7] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:176:739 [5] NCCL INFO Channel 00/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 01/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 02/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 03/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 05/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 06/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 07/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 08/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 10/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 11/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 12/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 13/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 15/0 : 13[5] -> 5[5] [receive] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 00/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 01/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 02/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 03/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 05/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 06/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 07/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 08/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 10/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 11/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 12/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 13/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:176:739 [5] NCCL INFO Channel 15/0 : 5[5] -> 13[5] [send] via NET/IB/5 shortk8snode10:175:734 [4] NCCL INFO Channel 04/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 05/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 06/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 07/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 09/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 10/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 11/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 12/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 14/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 15/0 : 12[4] -> 4[4] [receive] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 00/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 01/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 02/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 04/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 05/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 06/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 07/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 09/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 10/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 11/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 12/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 14/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:175:734 [4] NCCL INFO Channel 15/0 : 4[4] -> 12[4] [send] via NET/IB/4 shortk8snode10:173:735 [2] NCCL INFO Channel 00/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 01/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 03/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 04/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 05/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 06/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 08/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 09/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 10/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 11/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 13/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 14/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 15/0 : 10[2] -> 2[2] [receive] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 00/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 01/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 03/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 04/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 05/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 06/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 08/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 09/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 10/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 11/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 13/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 14/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:173:735 [2] NCCL INFO Channel 15/0 : 2[2] -> 10[2] [send] via NET/IB/3 shortk8snode10:177:740 [6] NCCL INFO Connected NVLS tree shortk8snode10:177:740 [6] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:177:740 [6] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:174:736 [3] NCCL INFO Connected NVLS tree shortk8snode10:174:736 [3] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:174:736 [3] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:171:733 [0] NCCL INFO Connected NVLS tree shortk8snode10:171:733 [0] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:171:733 [0] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:176:739 [5] NCCL INFO Connected NVLS tree shortk8snode10:176:739 [5] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:176:739 [5] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:173:735 [2] NCCL INFO Connected NVLS tree shortk8snode10:173:735 [2] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:173:735 [2] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:172:738 [1] NCCL INFO Connected NVLS tree shortk8snode10:172:738 [1] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:172:738 [1] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:175:734 [4] NCCL INFO Connected NVLS tree shortk8snode10:175:734 [4] NCCL INFO threadThresholds 8/8/64 | 128/8/64 | 512 | 512 shortk8snode10:175:734 [4] NCCL INFO 16 coll channels, 0 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer shortk8snode10:175:734 [4] NCCL INFO comm 0x55a12f01d300 rank 4 nranks 16 cudaDev 4 nvmlDev 4 busId a1000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:171:733 [0] NCCL INFO comm 0x5603a5cca990 rank 0 nranks 16 cudaDev 0 nvmlDev 0 busId 61000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:173:735 [2] NCCL INFO comm 0x55db3e681eb0 rank 2 nranks 16 cudaDev 2 nvmlDev 2 busId 65000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:177:740 [6] NCCL INFO comm 0x562b1dd2be40 rank 6 nranks 16 cudaDev 6 nvmlDev 6 busId a5000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:176:739 [5] NCCL INFO comm 0x55d35dd52f80 rank 5 nranks 16 cudaDev 5 nvmlDev 5 busId a3000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:172:738 [1] NCCL INFO comm 0x5575b9b0b3f0 rank 1 nranks 16 cudaDev 1 nvmlDev 1 busId 63000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:178:737 [7] NCCL INFO comm 0x55e8b4736a90 rank 7 nranks 16 cudaDev 7 nvmlDev 7 busId a7000 commId 0x1ae93ec1860acd74 - Init COMPLETE shortk8snode10:174:736 [3] NCCL INFO comm 0x55ca7b3c5270 rank 3 nranks 16 cudaDev 3 nvmlDev 3 busId 67000 commId 0x1ae93ec1860acd74 - Init COMPLETE [config] num_tokens=4096, hidden=7168, num_topk_groups=2, num_topk=8 [layout] Kernel performance: 0.075 ms
Can you please set test_ll_compatibility = False? Testing normal and low-latency kernels separately may solve this deconstruction issue on your platform.
I encountered a similar issue in SGLang+DeepEP on one 8 x H20 node. Not sure it is related or not.
My command:
$NVSHMEM_DEBUG=INFO python3 -m sglang.compile_deep_gemm --model /root/DeepSeek-R1 --tp 8 --trust-remote-code --tp 8 --host 0.0.0.0 --port 30000 --enable-deepep-moe --deepep-mode auto --max-running-requests 128 --disable-radix-cache --mem-fraction-static 0.9 --stream-output --cuda-graph-max-bs 128
@LyricZhao Hi, I met the similar error too. I found after updating to the latest code, I encountered a RuntimeError: CUDA error: an illegal memory access was encountered. However, when I reverted to commit ID 82dcf48fd315d7b83f2cd2b4f1d1f1fda6af8ed2 (on which I previously conducted experiments), the program executed normally. I suspect that some recent merges may have introduced bugs. I test internode.py with IB.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_2. Skipping... WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_0. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_1. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_2. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_3. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_6. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_7. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_8. Skipping...
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path.
WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
WARN: ibgda_alloc_and_map_qp_uar with GPU as handler failed. We may need to enter the CPU fallback path.
WARN: GPU cannot map UAR of device mlx5_9. Skipping...
/deepepdir/nvshmem_src/src/host/transport/transport.cpp:nvshmemi_transport_init:275: init failed for transport: IBGDA [config] num_tokens=4096, hidden=7168, num_topk_groups=2, num_topk=8 [layout] Kernel performance: 0.071 ms
[testing] Running with BF16, without top-k (async=False, previous=False) ...W0522 19:30:37.926604 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5197 via signal SIGTERM
W0522 19:30:37.927222 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5198 via signal SIGTERM
W0522 19:30:37.927315 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5199 via signal SIGTERM
W0522 19:30:37.927396 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5200 via signal SIGTERM
W0522 19:30:37.927469 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5201 via signal SIGTERM
W0522 19:30:37.927547 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5203 via signal SIGTERM
W0522 19:30:37.927618 5128 site-packages/torch/multiprocessing/spawn.py:160] Terminating process 5204 via signal SIGTERM
Traceback (most recent call last):
File "/workdir/deep_ep_dev/tests/test_internode.py", line 247, in
-- Process 5 terminated with the following error: Traceback (most recent call last): File "/usr/local/conda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap fn(i, *args) File "/workdir/deep_ep_dev/tests/test_internode.py", line 235, in test_loop test_main(i, local_rank, num_local_ranks, num_ranks, num_nodes, rank, buffer, group) File "/workdir/deep_ep_dev/tests/test_internode.py", line 109, in test_main recv_x, recv_topk_idx, recv_topk_weights, recv_num_tokens_per_expert_list, handle, event = buffer.dispatch(**dispatch_args) File "/home/hadoop-hdp/.local/lib/python3.9/site-packages/deep_ep/buffer.py", line 282, in dispatch return self.internode_dispatch(x, handle, num_tokens_per_rank, num_tokens_per_rdma_rank, is_token_in_rank, num_tokens_per_expert, File "/home/hadoop-hdp/.local/lib/python3.9/site-packages/deep_ep/buffer.py", line 390, in internode_dispatch recv_src_meta, send_rdma_head, send_nvl_head, event = self.runtime.internode_dispatch( RuntimeError: Failed: CUDA error /workdir/deep_ep_dev/csrc/kernels/internode.cu:1080 'an illegal memory access was encountered'
@defei-coder The latest code has changed the transport of normal kernel from IBRC to IBGDA, and these logs show that IBGDA is not functioning properly in your environment
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path. WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
@defei-coder The latest code has changed the transport of normal kernel from IBRC to IBGDA, and these logs show that IBGDA is not functioning properly in your environment
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path. WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path.
@sphish Thanks for reply. I know about this change(normal kernel from IBRC to IBGDA), will check the IBGDA setting, by the way, normal kernel with IBRC not work now? Second can the normal kernel run on the RoCE network? Do I need to make any changes?
@defei-coder In the latest commit, the normal kernels are no longer compatible with IBRC. And yes, the normal kernels can run on RoCE network.
I'm still confused. How did you solve the problem?
@sphish @defei-coder were you able to fix these issues? I am setting the correct nvidia driver settings (peermappingoverride enabled), yet I continue to get these errors.