Liger-Kernel issues

[QnA]: Why `cos` and `sin` is expected to be `hdim`, not `hdim//2`?

3

I have a question regarding to the qwen2_vl MRope. From my understanding is as follows: ``` full_cos = torch.cat([cos_halfdim, cos_halfdim], dim=-1) full_sin = torch.cat([sin_halfdim, sin_halfdim], dim=-1) ``` However from the...

tjtanaa

question

fixed_fused_linear_cross_entropy should pass through kwargs

4

## Summary ## Testing Done - Hardware Type: - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run `make test-convergence`...

npuichigo

fused_linear_cross_entropy caused bad performance

### 🐛 Describe the bug Hello, i'm finetuning Qwen-2.5-VL-3B with trl. When i turn on the liger kernel, it could lead to poor performance while the performance is normal without...

waynechu1021

Layernorm backward regression in triton 3.4.0 release candidate

1

### 🐛 Describe the bug FYI: we're seeing a regression in the layernorm backward kernel on shapes 4096 < x

davidberard98

Problem when running GRPO with liger kernel.

3

Hi, thanks for your work. When I run Qwen2.5-VL using GRPO with liger kernel in ms-swift, it happens: But when I turn --use_liger_kernel into false, it can run correctly. My...

Leon1207

Add support for the LFM2 model

### 🚀 The feature, motivation and pitch It shows better results than Qwen3 in the benchmark table. It has both large and small models. Adding this could be great. Blog:...

kadirnar

what is the difference between Triton code written for Liger kernels and generated by torch.compile

I'm a beginner. Can you please explain the difference between Triton code written for Liger kernels and Triton code generated by torch.compile? I feel like they seem the same, or...

mumu029

Why do not apply liger kernal to mlp in the ViT of Qwen2.5-VL?

4

Thanks for your work! Here I found we do not apply liger kernal to mlp in the ViT of Qwen2.5-VL, what's your consideration? Thanks!

Leon1207

make test-convergence get Number of mismatched elements

7

### 🐛 Describe the bug I tried running `make test-convergence` on a single A100 GPU, get the failed like this > FAILED test/convergence/fp32/test_mini_models.py::test_mini_model[mini_gemma3_text-32-0.0001-dtype3-1e-08-0.0001-0.005-1e-05-0.005-1e-05] - AssertionError: Number of mismatched elements: 11...

Dexterai

fix vocab_size path for gemma3

1

## Summary using `self.vocab_size` for the multimodal forward likely never worked or was deprecated in a transformers change.

winglian

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

[QnA]: Why `cos` and `sin` is expected to be `hdim`, not `hdim//2`?

fixed_fused_linear_cross_entropy should pass through kwargs

fused_linear_cross_entropy caused bad performance

Layernorm backward regression in triton 3.4.0 release candidate

Problem when running GRPO with liger kernel.

Add support for the LFM2 model

what is the difference between Triton code written for Liger kernels and generated by torch.compile

Why do not apply liger kernal to mlp in the ViT of Qwen2.5-VL?

make test-convergence get Number of mismatched elements

fix vocab_size path for gemma3

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard