Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Efficient Triton Kernels for LLM Training

Results 114 Liger-Kernel issues
Sort by recently updated
recently updated
newest added

### 🚀 The feature, motivation and pitch model code here -- https://github.com/huggingface/transformers/blob/main/src/transformers/models/jamba/modeling_jamba.py might be interesting to see how is a triton implementation of mixer forward compared to existing cuda forward...

feature

### 🚀 The feature, motivation and pitch FP8 Training has been a great weapon on H100 and provides huge memory and speed benefits, and has shown to be effective (with...

feature

### 🚀 The feature, motivation and pitch W8A8 (int8 for both weight and activation) matmul is beneficial for A100 and could provide great memory and speed benefits, and could be...

feature

## Summary https://github.com/linkedin/Liger-Kernel/issues/733 ## Testing Done Tested Attention Layer and Attention module implementation for FusedNeighborhoodAttention - Hardware Type: 3090 & H100 SXM5 - [x] run `make test` to ensure correctness...

### 🐛 Describe the bug Most failures are related to transformers VLM changes ## unit test qwen2vl_mrope - [x] test_qwen2vl_mrope https://github.com/linkedin/Liger-Kernel/pull/728 monkey patch - [ ] test_monkey_patch::test_apply_liger_kernel_to_instance_for_mllama_for_conditional_generation - [x] test_monkey_patch::test_apply_liger_kernel_to_instance_for_gemma3...

### 🚀 The feature, motivation and pitch New work from Prof. Dao's lab that improves on Deepseek's original Multihead Latent Attention. Relevant Paper: https://arxiv.org/pdf/2505.21487 ### Alternatives _No response_ ### Additional...

## Summary HuggingFace forward passes kwargs through: https://github.com/huggingface/transformers/blob/716819b8309324302e00a3488a3c3d6faa427f79/src/transformers/models/qwen2/modeling_qwen2.py#L712 This is important to compute FlashAttention kwargs outside of the forward, so that it's not recomputed on every attention layer, which causes...

### 🚀 The feature, motivation and pitch Interesting work around efficient attention and general sparse attention. Reference paper with fused NATTEN implementation in cutlass: https://arxiv.org/pdf/2504.16922 Relevant code: https://github.com/SHI-Labs/NATTEN/tree/main/csrc/include/natten/cuda/fna https://github.com/SHI-Labs/NATTEN/blob/main/csrc/include/natten/cuda/fna/kernel_forward.h https://github.com/SHI-Labs/NATTEN/blob/main/csrc/include/natten/cuda/fna/kernel_backward.h...

## Summary ## Testing Done - Hardware Type: RTX 3090 - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x] run...

## Summary @Tcc0403 back ground https://github.com/linkedin/Liger-Kernel/pull/524#issuecomment-2748651838 Once, while I was working on #524 PR, the following error occurred in the pailgemma section, and I checked the cause. ``` The language_model...