Liger-Kernel issues

[feat] Add jamba support

### 🚀 The feature, motivation and pitch model code here -- https://github.com/huggingface/transformers/blob/main/src/transformers/models/jamba/modeling_jamba.py might be interesting to see how is a triton implementation of mixer forward compared to existing cuda forward...

yundai424

feature

[feat] FP8 Matmul Training Kernel

### 🚀 The feature, motivation and pitch FP8 Training has been a great weapon on H100 and provides huge memory and speed benefits, and has shown to be effective (with...

qingquansong

feature

[feat] Int8 Matmul Training kernel

### 🚀 The feature, motivation and pitch W8A8 (int8 for both weight and activation) matmul is beneficial for A100 and could provide great memory and speed benefits, and could be...

qingquansong

feature

[Operator] Fused Neighborhood Attention

2

## Summary https://github.com/linkedin/Liger-Kernel/issues/733 ## Testing Done Tested Attention Layer and Attention module implementation for FusedNeighborhoodAttention - Hardware Type: 3090 & H100 SXM5 - [x] run `make test` to ensure correctness...

AndreSlavescu

CI failure tracker

3

### 🐛 Describe the bug Most failures are related to transformers VLM changes ## unit test qwen2vl_mrope - [x] test_qwen2vl_mrope https://github.com/linkedin/Liger-Kernel/pull/728 monkey patch - [ ] test_monkey_patch::test_apply_liger_kernel_to_instance_for_mllama_for_conditional_generation - [x] test_monkey_patch::test_apply_liger_kernel_to_instance_for_gemma3...

Tcc0403

Grouped Latent Attention

### 🚀 The feature, motivation and pitch New work from Prof. Dao's lab that improves on Deepseek's original Multihead Latent Attention. Relevant Paper: https://arxiv.org/pdf/2505.21487 ### Alternatives _No response_ ### Additional...

AndreSlavescu

fix: don't drop kwargs from huggingface forward

2

## Summary HuggingFace forward passes kwargs through: https://github.com/huggingface/transformers/blob/716819b8309324302e00a3488a3c3d6faa427f79/src/transformers/models/qwen2/modeling_qwen2.py#L712 This is important to compute FlashAttention kwargs outside of the forward, so that it's not recomputed on every attention layer, which causes...

llllvvuu

Fused Neighborhood Attention

### 🚀 The feature, motivation and pitch Interesting work around efficient attention and general sparse attention. Reference paper with fused NATTEN implementation in cutlass: https://arxiv.org/pdf/2504.16922 Relevant code: https://github.com/SHI-Labs/NATTEN/tree/main/csrc/include/natten/cuda/fna https://github.com/SHI-Labs/NATTEN/blob/main/csrc/include/natten/cuda/fna/kernel_forward.h https://github.com/SHI-Labs/NATTEN/blob/main/csrc/include/natten/cuda/fna/kernel_backward.h...

AndreSlavescu

mta + softmax docs

## Summary ## Testing Done - Hardware Type: RTX 3090 - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run...

AndreSlavescu

Fix: pailgemma type error

8

## Summary @Tcc0403 back ground https://github.com/linkedin/Liger-Kernel/pull/524#issuecomment-2748651838 Once, while I was working on #524 PR, the following error occurred in the pailgemma section, and I checked the cause. ``` The language_model...

jp1924

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

[feat] Add jamba support

[feat] FP8 Matmul Training Kernel

[feat] Int8 Matmul Training kernel

[Operator] Fused Neighborhood Attention

CI failure tracker

Grouped Latent Attention

fix: don't drop kwargs from huggingface forward

Fused Neighborhood Attention

mta + softmax docs

Fix: pailgemma type error

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard