Fix ep deployment issues

Open CUHKSZzxy opened this issue 2 months ago • 0 comments

Modifications

Expose deepep env var

Default deepep buffer num sms will raise the following errors on H200 multi-nodes. Therefore, we expose this environment variable to users for configuration. A feasible value on H200 is DEEPEP_BUFFER_NUM_SMS=16.

csrc/kernels/internode.cu:386, condition: ibgda_get_state()->num_rc_per_pe == num_channels or ibgda_get_state()->num_rc_per_pe >= num_sms

This is a known issue in deepep

https://github.com/deepseek-ai/DeepEP/issues/226

Fix DeepEP mode in CUDA graph

Flip DeepEP mode between prefill and decode, and also clear the buffer (performed by the DLBLas side when setting to low latency). Otherwise, it will trigger CUDA illegal memory access in deepep or the following deepgemm kernel, as known in

https://github.com/sgl-project/sglang/pull/11666

Upgrade DeepEP / DeepGEMM / DLBlas / FlashMLA

DeepEP -> v1.2.1
DeepGEMM -> v2.1.1.post3
DLBlas -> v0.0.6
FlashMLA -> commit 1408756 (no official release)

Other modifications

Add some deep_gemm cuda dependencies
Pin torch version to avoid build / runtime version mismatch (leads to undefined symbol for deep_gemm)
Add vim
Add some comments

Oct 30 '25 03:10 CUHKSZzxy