When testing test_internode.py, what is the effect of setting NVSHMEM_DISABLE_P2P=1? Will NVLink be disabled?
After setting NVSHMEM_DISABLE_P2P=1, you cannot use NVSHMEM for NVLink transfers. However, this is not an issue in our implementation, as our NVLink data transfers does not rely on NVSHMEM API. Instead, we directly utilize CUDA PTX instructions for NVLink data transfers.
Thank you for your reply. Further, I want to confirm, what is the purpose of setting NVSHMEM_DISABLE_P2P=1 in the low_latency scenario? And what is the impact if NVSHMEM_DISABLE_P2P=1 is set in internode scenarios?
Enabling this environment variable also works, but we disable it to ensure we are not using NVLink through NVSHMEM.
Enabling this environment variable also works, but we disable it to ensure we are not using NVLink through NVSHMEM.
#But when I set NVSHMEM_DISABLE_P2P to 0 and run low latency test on two nodes (16 GPUs), I get the following error:
Enabling this environment variable also works, but we disable it to ensure we are not using NVLink through NVSHMEM.
#But when I set NVSHMEM_DISABLE_P2P to 0 and run low latency test on two nodes (16 GPUs), I get the following error:
It appears that the program is failing during the bootstrap phase of NVSHMEM, which doesn't seem reasonable, but I'm not sure why this is happening.