Liger-Kernel issues

LigerGEGLUMLP error with torch.compile

7

### 🐛 Describe the bug When using ligergeglumlp with torch complie i get the following error. ``` UserWarning: Traceback (most recent call last): Encountered an exception in identify_mutated_tensors, assuming every...

Luke-Chesley

bug

Lenient Test

1

### 🐛 Describe the bug https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_swiglu.py#L46 https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_geglu.py#L38 1e0 for fp32 and 1e4 for bf16 Seems a little excessive If kernels don't cause models to diverge; this test ought to pass...

jon-chuang

[fun] llama.triton

7

### 🚀 The feature, motivation and pitch @thomwolf and i have an idea to implement llama from scratch in pure triton, inspired by karpathy. liger kernel already contains most of...

ByronHsu

fun

[feat] FusedLinearCrossEntropy support for Gemma2

4

### 🚀 The feature, motivation and pitch FLCE needs special handling for the soft capping in gemma2: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L1054 ### Alternatives _No response_ ### Additional context _No response_

yundai424

feature

[fun] mamba.triton

### 🚀 The feature, motivation and pitch From https://discord.com/channels/1189498204333543425/1275130785933951039/1278522387653984358. Have fun 🐍! ### Alternatives _No response_ ### Additional context _No response_

ByronHsu

fun

Which GPUs does this work on?

10

I'm assuming it only works on Ampere, Hopper, Lovelace. Is that correct? It might be nice to specify in the readme, if it is limited to certain GPU types.

nbroad1881

documentation

help wanted

[feat] support for DeepseekV2

4

### 🚀 The feature, motivation and pitch It would be nice to support DeepseekV2 models. Unfortunately the modeling code is not yet accepted into transformers, and requires trust_remote_code=True I'm monkey-patching...

tmm1

help wanted

huggingface

feature

[feat] Add support for encoder-only transformers (e.g. BERT)

### 🚀 The feature, motivation and pitch Liger Kernel is currently incompatible with encoder-only transformer architectures such as BERT, DistilBERT, RoBERTa, XLM-R, and DeBERTa. Given the importance these models still...

OxxoCodes

feature

Unable to use FLCE with FSDP+PEFT+embeddings layers

2

### 🐛 Describe the bug when trying to train both LoRA layers on the base model and also set modules_to_save on the lora config which makes the embeddings layers trainable...

winglian

bug

[feat] on-paper form of RoPE

2

### 🚀 The feature, motivation and pitch right now our implementation of RoPE assumes the rotation matrix is created and used in the [HuggingFace model code](https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/llama/modeling_llama.py#L253) way, i.e. instead of...

yundai424

enhancement

good first issue

feature

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

LigerGEGLUMLP error with torch.compile

Lenient Test

[fun] llama.triton

[feat] FusedLinearCrossEntropy support for Gemma2

[fun] mamba.triton

Which GPUs does this work on?

[feat] support for DeepseekV2

[feat] Add support for encoder-only transformers (e.g. BERT)

Unable to use FLCE with FSDP+PEFT+embeddings layers

[feat] on-paper form of RoPE

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard