Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Efficient Triton Kernels for LLM Training

Results 114 Liger-Kernel issues
Sort by recently updated
recently updated
newest added

### 🐛 Describe the bug When using ligergeglumlp with torch complie i get the following error. ``` UserWarning: Traceback (most recent call last): Encountered an exception in identify_mutated_tensors, assuming every...

bug

### 🐛 Describe the bug https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_swiglu.py#L46 https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_geglu.py#L38 1e0 for fp32 and 1e4 for bf16 Seems a little excessive If kernels don't cause models to diverge; this test ought to pass...

### 🚀 The feature, motivation and pitch @thomwolf and i have an idea to implement llama from scratch in pure triton, inspired by karpathy. liger kernel already contains most of...

fun

### 🚀 The feature, motivation and pitch FLCE needs special handling for the soft capping in gemma2: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L1054 ### Alternatives _No response_ ### Additional context _No response_

feature

### 🚀 The feature, motivation and pitch From https://discord.com/channels/1189498204333543425/1275130785933951039/1278522387653984358. Have fun 🐍! ### Alternatives _No response_ ### Additional context _No response_

fun

I'm assuming it only works on Ampere, Hopper, Lovelace. Is that correct? It might be nice to specify in the readme, if it is limited to certain GPU types.

documentation
help wanted

### 🚀 The feature, motivation and pitch It would be nice to support DeepseekV2 models. Unfortunately the modeling code is not yet accepted into transformers, and requires trust_remote_code=True I'm monkey-patching...

help wanted
huggingface
feature

### 🚀 The feature, motivation and pitch Liger Kernel is currently incompatible with encoder-only transformer architectures such as BERT, DistilBERT, RoBERTa, XLM-R, and DeBERTa. Given the importance these models still...

feature

### 🐛 Describe the bug when trying to train both LoRA layers on the base model and also set modules_to_save on the lora config which makes the embeddings layers trainable...

bug

### 🚀 The feature, motivation and pitch right now our implementation of RoPE assumes the rotation matrix is created and used in the [HuggingFace model code](https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/llama/modeling_llama.py#L253) way, i.e. instead of...

enhancement
good first issue
feature