lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

Fix ep deployment issues

Open CUHKSZzxy opened this issue 2 months ago • 0 comments

Modifications

  1. Expose deepep env var

Default deepep buffer num sms will raise the following errors on H200 multi-nodes. Therefore, we expose this environment variable to users for configuration. A feasible value on H200 is DEEPEP_BUFFER_NUM_SMS=16.

csrc/kernels/internode.cu:386, condition: ibgda_get_state()->num_rc_per_pe == num_channels or ibgda_get_state()->num_rc_per_pe >= num_sms

This is a known issue in deepep

  • https://github.com/deepseek-ai/DeepEP/issues/226
  1. Fix DeepEP mode in CUDA graph

Flip DeepEP mode between prefill and decode, and also clear the buffer (performed by the DLBLas side when setting to low latency). Otherwise, it will trigger CUDA illegal memory access in deepep or the following deepgemm kernel, as known in

  • https://github.com/sgl-project/sglang/pull/11666
  1. Upgrade DeepEP / DeepGEMM / DLBlas / FlashMLA
  • DeepEP -> v1.2.1
  • DeepGEMM -> v2.1.1.post3
  • DLBlas -> v0.0.6
  • FlashMLA -> commit 1408756 (no official release)
  1. Other modifications
  • Add some deep_gemm cuda dependencies
  • Pin torch version to avoid build / runtime version mismatch (leads to undefined symbol for deep_gemm)
  • Add vim
  • Add some comments

CUHKSZzxy avatar Oct 30 '25 03:10 CUHKSZzxy