Liger-Kernel issues

No Significant Improvement Observed in Model Training Speed

1

I am trying to speedup inference and training of a `mistralai/Mistral-Small-3.1-24B-Instruct-2503` model. Simply replacing `AutoModelForCausalLM` with `AutoLigerKernelForCausalLM` does not lead to any speedup in my sampling speed or memory usage....

albertbou92

Support for windows build

### 🚀 The feature, motivation and pitch Liger-kernel look for dependency of triton but windows has triton support name after triton-windows ### Alternatives add support for windows triton ### Additional...

sorasoras

[DRAFT] Add deepseek v3 monkey patch

2

## Summary #623 ## Testing Done - Hardware Type: - [ ] run `make test` to ensure correctness - [ ] run `make checkstyle` to ensure code style - [...

zcnrex

DeepSeek Native Sparse Attention (NSA) Kernel

3

### 🚀 The feature, motivation and pitch Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention https://arxiv.org/abs/2502.11089 Potentially useful python reference https://github.com/dhcode-cpp/NSA-pytorch ### Alternatives _No response_ ### Additional context _No...

qingquansong

help wanted

feature

fun

[WIP] Update benchmark data

6

## Summary Rerun all benchmarks scripts to get the latest data, so we can have a reliable baseline for future optimization. Note: orpo failing with `compile=True` (plotting with old data...

Tcc0403

The Liger-Kernel library is running slower.

2

I'm training the Orpheus-TTS model using the transformers library. To speed it up, I'm using fsdp + sdpa + compile. However, when I tried liger-kernel for further acceleration, compile doesn't...

kadirnar

When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.

5

### 🐛 Describe the bug When the model is split across multiple GPUs using ```device_map="auto"```, the liger-kernel will return a ```ValueError```. ### Reproduce ```import os os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' from transformers...

Songjw133

[bug] deepspeed zero++ multinode with liger kernel

2

### 🐛 Describe the bug #### deepspeed zero++ config - I ran the training with slrum ``` { "zero_optimization": { "stage": 3, "stage3_gather_16bit_weights_on_model_save": true, "reduce_bucket_size": "auto", "zero_hpz_partition_size": 8, "zero_quantized_weights": true,...

SoundProvider

Further Improved Convergence test

### 🚀 The feature, motivation and pitch While performing PR #524, it was determined that testing the functionality of patching the Liger-Kernel onto model instances is insufficient. (It is recommended...

jp1924

[RFC] More robust revert functions for convergence tests

### 🐛 Describe the bug Many discussions show that the current revert functions have several limitations, including: - incomplete revert: https://github.com/linkedin/Liger-Kernel/pull/627#issuecomment-2757281103 #542 - not automatically updating old reference: #385 -...

Tcc0403

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

No Significant Improvement Observed in Model Training Speed

Support for windows build

[DRAFT] Add deepseek v3 monkey patch

DeepSeek Native Sparse Attention (NSA) Kernel

[WIP] Update benchmark data

The Liger-Kernel library is running slower.

When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.

[bug] deepspeed zero++ multinode with liger kernel

Further Improved Convergence test

[RFC] More robust revert functions for convergence tests

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard