Liger-Kernel
Liger-Kernel copied to clipboard
Efficient Triton Kernels for LLM Training
### 🐛 Describe the bug When using ligergeglumlp with torch complie i get the following error. ``` UserWarning: Traceback (most recent call last): Encountered an exception in identify_mutated_tensors, assuming every...
### 🐛 Describe the bug https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_swiglu.py#L46 https://github.com/linkedin/Liger-Kernel/blob/e249eee723978bf8610ff1ea2297d048a2417e20/test/transformers/test_geglu.py#L38 1e0 for fp32 and 1e4 for bf16 Seems a little excessive If kernels don't cause models to diverge; this test ought to pass...
### 🚀 The feature, motivation and pitch @thomwolf and i have an idea to implement llama from scratch in pure triton, inspired by karpathy. liger kernel already contains most of...
### 🚀 The feature, motivation and pitch FLCE needs special handling for the soft capping in gemma2: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L1054 ### Alternatives _No response_ ### Additional context _No response_
### 🚀 The feature, motivation and pitch From https://discord.com/channels/1189498204333543425/1275130785933951039/1278522387653984358. Have fun 🐍! ### Alternatives _No response_ ### Additional context _No response_
I'm assuming it only works on Ampere, Hopper, Lovelace. Is that correct? It might be nice to specify in the readme, if it is limited to certain GPU types.
### 🚀 The feature, motivation and pitch It would be nice to support DeepseekV2 models. Unfortunately the modeling code is not yet accepted into transformers, and requires trust_remote_code=True I'm monkey-patching...
### 🚀 The feature, motivation and pitch Liger Kernel is currently incompatible with encoder-only transformer architectures such as BERT, DistilBERT, RoBERTa, XLM-R, and DeBERTa. Given the importance these models still...
### 🐛 Describe the bug when trying to train both LoRA layers on the base model and also set modules_to_save on the lora config which makes the embeddings layers trainable...
### 🚀 The feature, motivation and pitch right now our implementation of RoPE assumes the rotation matrix is created and used in the [HuggingFace model code](https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/llama/modeling_llama.py#L253) way, i.e. instead of...