Liger-Kernel
Liger-Kernel copied to clipboard
[Model] DeepseekV2 Support
Summary
Resolves #129 Add monkeypatch to support deepseepV2 model.
Details
Ops patched:
- rms_norm
- swiglu
- cross_entropy
- fused_linear_cross_entropy
Testing Done
- Hardware Type: NVIDIA A100-SXM4-40GB
- [x] run
make testto ensure correctness - [x] run
make checkstyleto ensure code style - [x] run
make test-convergenceto ensure convergence
@ByronHsu @yundai424 @Tcc0403 @qingquansong As discussed in the issue, the rope implementation is different in DeepSeek.
deepseek:
cos = cos[position_ids].unsqueeze(unsqueeze_dim)
sin = sin[position_ids].unsqueeze(unsqueeze_dim)
b, h, s, d = q.shape
q = q.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)
b, h, s, d = k.shape
k = k.view(b, h, s, d // 2, 2).transpose(4, 3).reshape(b, h, s, d)
q_embed = (q * cos) + (rotate_half(q) * sin)
k_embed = (k * cos) + (rotate_half(k) * sin)
return q_embed, k_embed
llama:
cos = cos.unsqueeze(unsqueeze_dim)
sin = sin.unsqueeze(unsqueeze_dim)
q_embed = (q * cos) + (rotate_half(q) * sin)
k_embed = (k * cos) + (rotate_half(k) * sin)
return q_embed, k_embed`
I will create a separate PR to implement the DeepSeek rope.