Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

[Model] DeepseekV2 Support

Open saurabhkoshatwar opened this issue 11 months ago • 1 comments

Summary

Resolves #129 Add monkeypatch to support deepseepV2 model.

Details

Ops patched:

  • rms_norm
  • swiglu
  • cross_entropy
  • fused_linear_cross_entropy

Testing Done

  • Hardware Type: NVIDIA A100-SXM4-40GB
  • [x] run make test to ensure correctness
  • [x] run make checkstyle to ensure code style
  • [x] run make test-convergence to ensure convergence

saurabhkoshatwar avatar Dec 26 '24 00:12 saurabhkoshatwar

@ByronHsu @yundai424 @Tcc0403 @qingquansong As discussed in the issue, the rope implementation is different in DeepSeek.

deepseek:

    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)

    b, h, s, d = q.shape
    q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    b, h, s, d = k.shape
    k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

llama:

    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed`

I will create a separate PR to implement the DeepSeek rope.

saurabhkoshatwar avatar Jan 07 '25 01:01 saurabhkoshatwar