Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

fix: don't drop kwargs from huggingface forward

Open llllvvuu opened this issue 6 months ago • 2 comments

Summary

HuggingFace forward passes kwargs through: https://github.com/huggingface/transformers/blob/716819b8309324302e00a3488a3c3d6faa427f79/src/transformers/models/qwen2/modeling_qwen2.py#L712

This is important to compute FlashAttention kwargs outside of the forward, so that it's not recomputed on every attention layer, which causes a number of issues: https://github.com/huggingface/transformers/issues/35588

Testing Done

  • Hardware Type: H100
  • [ ] run make test to ensure correctness
  • [x] run make checkstyle to ensure code style
  • [ ] run make test-convergence to ensure convergence

llllvvuu avatar May 11 '25 12:05 llllvvuu

LGTM, @llllvvuu as soon as the merge conflict is resolved we can get this in

yundai424 avatar May 22 '25 21:05 yundai424

LGTM, @llllvvuu as soon as the merge conflict is resolved we can get this in

Done

llllvvuu avatar May 25 '25 07:05 llllvvuu