Shifang Xu
Shifang Xu
Perhaps it would be better for a colleague who is familiar with the fp4 weight format to handle this issue. So I just close this pr.
> > Original w13_weight_scale shape: [num_local_experts, M, K] > > After swizzling shape: [M_padded, K_padded] - missing expert dimension > > EPLB cannot handle this shape correctly > > Can...
This fix has been merged into hybrid-ep branch within https://github.com/deepseek-ai/DeepEP/pull/501.
This pr implements three features: (1) supporting to use Mamba layer as mtp_model_layer. (2) splitting the MTP loss calculation in the GPT model’s forward pass into a separate function. (3)...
Hi, @yashaswikarnati, could you please add some documentation to this PR? For example, you could include some excerpts from the design document. https://docs.google.com/document/d/17FU3-CEAzob6lR40AY3LVCy57HHPsZZny8K4Z78l7ug/edit?tab=t.0#heading=h.bh7bns6bwmiq