Liger-Kernel
Liger-Kernel copied to clipboard
Efficient Triton Kernels for LLM Training
I have a question regarding to the qwen2_vl MRope. From my understanding is as follows: ``` full_cos = torch.cat([cos_halfdim, cos_halfdim], dim=-1) full_sin = torch.cat([sin_halfdim, sin_halfdim], dim=-1) ``` However from the...
## Summary ## Testing Done - Hardware Type: - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence`...
### 🐛 Describe the bug Hello, i'm finetuning Qwen-2.5-VL-3B with trl. When i turn on the liger kernel, it could lead to poor performance while the performance is normal without...
### 🐛 Describe the bug FYI: we're seeing a regression in the layernorm backward kernel on shapes 4096 < x
Hi, thanks for your work. When I run Qwen2.5-VL using GRPO with liger kernel in ms-swift, it happens: But when I turn --use_liger_kernel into false, it can run correctly. My...
### 🚀 The feature, motivation and pitch It shows better results than Qwen3 in the benchmark table. It has both large and small models. Adding this could be great. Blog:...
I'm a beginner. Can you please explain the difference between Triton code written for Liger kernels and Triton code generated by torch.compile? I feel like they seem the same, or...
Thanks for your work! Here I found we do not apply liger kernal to mlp in the ViT of Qwen2.5-VL, what's your consideration? Thanks!
### 🐛 Describe the bug I tried running `make test-convergence` on a single A100 GPU, get the failed like this > FAILED test/convergence/fp32/test_mini_models.py::test_mini_model[mini_gemma3_text-32-0.0001-dtype3-1e-08-0.0001-0.005-1e-05-0.005-1e-05] - AssertionError: Number of mismatched elements: 11...
## Summary using `self.vocab_size` for the multimodal forward likely never worked or was deprecated in a transformers change.