Can deep_ep run on environments with more than 160 ranks?
I noticed a restriction in /csrc/deep_ep.cpp: EP_HOST_ASSERT(0 <= rank && rank < num_ranks && (num_ranks <= NUM_MAX_NVL_PEERS * NUM_MAX_RDMA_PEERS || low_latency_mode));
where NUM_MAX_NVL_PEERS = 8 and NUM_MAX_RDMA_PEERS = 20. This implies the rank count cannot exceed 160 (8*20).
I tested this on a 24-node cluster, and the assertion was triggered. Therefore, my question is: Does deep_ep actually support training on clusters with more than 20 nodes? Or is there any misunderstanding in my interpretation?
Typically, a training job might run on more than 20 nodes, but the EP group size within that job usually does not exceed 160.
@ZhenguoYao1 Using gb200 support from https://github.com/fzyzcjy/DeepEP/tree/feat/dev_20250914, you can scale beyond 160.
@goelayu Did you modify this https://github.com/fzyzcjy/DeepEP/blob/483f00af8490b0cc378823c6adecf9ea67602071/csrc/kernels/launch.cuh#L54 to scale up the ranks?