DeepEP icon indicating copy to clipboard operation
DeepEP copied to clipboard

Can DeepEP run correctly in cudaGraph mode?

Open ghostplant opened this issue 7 months ago • 3 comments

Two questions:

  1. Can DeepEP run normally in cudaGraph mode?
  2. Does DeepEP perform "dropless" MoE dispatch? (i.e. no token discarded if tokens are heavily routed to a limited number of experts)

ghostplant avatar May 17 '25 13:05 ghostplant

  1. Yes, but only for the normal kernels;
  2. Yes; If you want to drop tokens, you should perform at the gate (masking some topk_idx into -1), DeepEP supports ignoring -1 expert selection (no send for such cases).

LyricZhao avatar May 22 '25 09:05 LyricZhao

@LyricZhao Thank you, for drop-less dispatch, will the utilization still be that fast when gating selection is imbalanced (e.g. all tokens routed to the same GPU)?

ghostplant avatar May 22 '25 14:05 ghostplant

will the utilization still be that fast when gating selection is imbalanced

The overall performance will be bound at the imbalanced rank. In the terms of the imbalanced rank itself, the utilization should be full.

LyricZhao avatar May 23 '25 02:05 LyricZhao

  1. Yes, but only for the normal kernels;

In sglang, low-latancy kernel could run normally in cudaGraph Mode!

Why only for the normal kernels? @LyricZhao

alpha-baby avatar Jun 24 '25 03:06 alpha-baby