Why NUM_MAX_NVL_PEERS must be 8?
In deep_ep.hpp, Can it be a smaller number? For example I only have a node with 2 H800 to run test_low_latency.py
If you only have one node, EP2 is supported for both intranode kernels (via NVLink) and low-latency kernels (via RDMA).
For multiple nodes with each less than 8 GPUs, you can change the NUM_MAX_NVL_PEERS macro into your settings to see whether the kernels work. We may later add a compilation macro for this. Thanks for feedback.
Related issue https://github.com/deepseek-ai/DeepEP/issues/477
@LyricZhao Hitting this assert when switching NUM_MAX_NVL_PEERS to 4:
DeepEP/csrc/kernels/internode.cu(295): error: static assertion failed with "Invalid number of NVL peers"
static_assert(4 * sizeof(bool) == sizeof(uint64_t), "Invalid number of NVL peers");
^
DeepEP/csrc/kernels/internode.cu(507): error: static assertion failed with "Invalid number of NVL peers"
static_assert(4 * sizeof(bool) == sizeof(uint64_t), "Invalid number of NVL peers");
Encountered the same issue. And when I attempt to resolve by commenting out the assertions, I get a new error:
/miniconda3/envs/sglang/lib/python3.10/site-packages/deep_ep/buffer.py", line 135, in __init__ self.runtime.sync(device_ids, ipc_handles, root_unique_id) RuntimeError: Failed: CUDA error /home/annali/sglang/DeepEP/csrc/deep_ep.cpp:113 'invalid resource handle'
I'm on the latest branch of DeepEP with NUM_MAX_NVL_PEERS=2
@LyricZhao Hitting this assert when switching NUM_MAX_NVL_PEERS to 4:
DeepEP/csrc/kernels/internode.cu(295): error: static assertion failed with "Invalid number of NVL peers" static_assert(4 * sizeof(bool) == sizeof(uint64_t), "Invalid number of NVL peers"); ^ DeepEP/csrc/kernels/internode.cu(507): error: static assertion failed with "Invalid number of NVL peers" static_assert(4 * sizeof(bool) == sizeof(uint64_t), "Invalid number of NVL peers");