Liger-Kernel fix: don't drop kwargs from huggingface forward

fix: don't drop kwargs from huggingface forward

Open llllvvuu opened this issue 6 months ago • 2 comments

Summary

HuggingFace forward passes kwargs through: https://github.com/huggingface/transformers/blob/716819b8309324302e00a3488a3c3d6faa427f79/src/transformers/models/qwen2/modeling_qwen2.py#L712

This is important to compute FlashAttention kwargs outside of the forward, so that it's not recomputed on every attention layer, which causes a number of issues: https://github.com/huggingface/transformers/issues/35588

Testing Done

Hardware Type: H100
[ ] run make test to ensure correctness
[x] run make checkstyle to ensure code style
[ ] run make test-convergence to ensure convergence

May 11 '25 12:05 llllvvuu

LGTM, @llllvvuu as soon as the merge conflict is resolved we can get this in

May 22 '25 21:05 yundai424

LGTM, @llllvvuu as soon as the merge conflict is resolved we can get this in

Done

May 25 '25 07:05 llllvvuu

Liger-Kernel Liger-Kernel copied to clipboard

fix: don't drop kwargs from huggingface forward

Summary

Testing Done

Liger-Kernel
Liger-Kernel copied to clipboard