Run busGrind -p 1 -u 0 -e 0 -d 1, the Concurrent Host/Device Bandwidth Matrix result is confusing.
run busGrind -p 1 -u 0 -e 0 -d 1, I got, .......
Test Description: Bus bandwidth between the host and a single device
Host/Device Bandwidth Matrix (GB/s), memory=Pinned Dir\D 0 1 2 3 4 5 6 7 D2H 56.83 57.12 57.14 57.15 55.37 55.47 55.49 53.53 H2D 56.17 56.21 56.21 56.20 56.17 56.17 56.12 56.14 BiDir 101.26 101.38 101.40 101.37 89.20 43.62 7.25 9.71
Test Description: Bus bandwidth between the host and multiple devices concurrently
Concurrent Host/Device Bandwidth Matrix (GB/s), memory=Pinned Dir\D 0 1 2 3 4 5 6 7 Total H2D 44.61 44.80 43.68 43.69 25.32 25.49 25.45 25.47 278.51 D2H 15.94 15.91 15.80 15.84 11.57 11.58 11.58 11.60 109.83 BiDir 22.55 22.76 22.62 22.64 10.04 4.40 17.99 18.03 141.02
As we can see, the BiDir result is confusing. why device 5 and device 7 have such low bandwidth? Is this expected?
BusGrind is a cuda demo suite tool. This might not be the right place to ask, but I couldn’t find the demo suite's repository.
Looking forward to your response.
For this kind of question we will refer you to ask your question in the NVIDIA Developer Forums https://forums.developer.nvidia.com/ . Thanks.