Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Efficient Triton Kernels for LLM Training

Results 114 Liger-Kernel issues
Sort by recently updated
recently updated
newest added
trafficstars

I am trying to speedup inference and training of a `mistralai/Mistral-Small-3.1-24B-Instruct-2503` model. Simply replacing `AutoModelForCausalLM` with `AutoLigerKernelForCausalLM` does not lead to any speedup in my sampling speed or memory usage....

### 🚀 The feature, motivation and pitch Liger-kernel look for dependency of triton but windows has triton support name after triton-windows ### Alternatives add support for windows triton ### Additional...

## Summary #623 ## Testing Done - Hardware Type: - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [...

### 🚀 The feature, motivation and pitch Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention https://arxiv.org/abs/2502.11089 Potentially useful python reference https://github.com/dhcode-cpp/NSA-pytorch ### Alternatives _No response_ ### Additional context _No...

help wanted
feature
fun

## Summary Rerun all benchmarks scripts to get the latest data, so we can have a reliable baseline for future optimization. Note: orpo failing with `compile=True` (plotting with old data...

I'm training the Orpheus-TTS model using the transformers library. To speed it up, I'm using fsdp + sdpa + compile. However, when I tried liger-kernel for further acceleration, compile doesn't...

### 🐛 Describe the bug When the model is split across multiple GPUs using ```device_map="auto"```, the liger-kernel will return a ```ValueError```. ### Reproduce ```import os os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' from transformers...

### 🐛 Describe the bug #### deepspeed zero++ config - I ran the training with slrum ``` { "zero_optimization": { "stage": 3, "stage3_gather_16bit_weights_on_model_save": true, "reduce_bucket_size": "auto", "zero_hpz_partition_size": 8, "zero_quantized_weights": true,...

### 🚀 The feature, motivation and pitch While performing PR #524, it was determined that testing the functionality of patching the Liger-Kernel onto model instances is insufficient. (It is recommended...

### 🐛 Describe the bug Many discussions show that the current revert functions have several limitations, including: - incomplete revert: https://github.com/linkedin/Liger-Kernel/pull/627#issuecomment-2757281103 #542 - not automatically updating old reference: #385 -...