Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Efficient Triton Kernels for LLM Training

Results 163 Liger-Kernel issues
Sort by recently updated
recently updated
newest added

I have a question regarding to the qwen2_vl MRope. From my understanding is as follows: ``` full_cos = torch.cat([cos_halfdim, cos_halfdim], dim=-1) full_sin = torch.cat([sin_halfdim, sin_halfdim], dim=-1) ``` However from the...

question

## Summary ## Testing Done - Hardware Type: - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence`...

### 🐛 Describe the bug Hello, i'm finetuning Qwen-2.5-VL-3B with trl. When i turn on the liger kernel, it could lead to poor performance while the performance is normal without...

### 🐛 Describe the bug FYI: we're seeing a regression in the layernorm backward kernel on shapes 4096 < x

Hi, thanks for your work. When I run Qwen2.5-VL using GRPO with liger kernel in ms-swift, it happens: But when I turn --use_liger_kernel into false, it can run correctly. My...

### 🚀 The feature, motivation and pitch It shows better results than Qwen3 in the benchmark table. It has both large and small models. Adding this could be great. Blog:...

I'm a beginner. Can you please explain the difference between Triton code written for Liger kernels and Triton code generated by torch.compile? I feel like they seem the same, or...

Thanks for your work! Here I found we do not apply liger kernal to mlp in the ViT of Qwen2.5-VL, what's your consideration? Thanks!

### 🐛 Describe the bug I tried running `make test-convergence` on a single A100 GPU, get the failed like this > FAILED test/convergence/fp32/test_mini_models.py::test_mini_model[mini_gemma3_text-32-0.0001-dtype3-1e-08-0.0001-0.005-1e-05-0.005-1e-05] - AssertionError: Number of mismatched elements: 11...

## Summary using `self.vocab_size` for the multimodal forward likely never worked or was deprecated in a transformers change.