Liger-Kernel [Model] DeepseekV2 Support

[Model] DeepseekV2 Support

Open saurabhkoshatwar opened this issue 11 months ago • 1 comments

Summary

Resolves #129 Add monkeypatch to support deepseepV2 model.

Details

Ops patched:

rms_norm
swiglu
cross_entropy
fused_linear_cross_entropy

Testing Done

Hardware Type: NVIDIA A100-SXM4-40GB
[x] run make test to ensure correctness
[x] run make checkstyle to ensure code style
[x] run make test-convergence to ensure convergence

Dec 26 '24 00:12 saurabhkoshatwar

@ByronHsu @yundai424 @Tcc0403 @qingquansong As discussed in the issue, the rope implementation is different in DeepSeek.

deepseek:

    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
    sin = sin[position_ids].unsqueeze(unsqueeze_dim)

    b, h, s, d = q.shape
    q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    b, h, s, d = k.shape
    k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)

    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

llama:

    cos = cos.unsqueeze(unsqueeze_dim)
    sin = sin.unsqueeze(unsqueeze_dim)
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed`

I will create a separate PR to implement the DeepSeek rope.

Jan 07 '25 01:01 saurabhkoshatwar

Liger-Kernel Liger-Kernel copied to clipboard

[Model] DeepseekV2 Support

Summary

Details

Testing Done

Liger-Kernel
Liger-Kernel copied to clipboard