Liger-Kernel
Liger-Kernel copied to clipboard
Efficient Triton Kernels for LLM Training
### 🐛 Describe the bug CI details: [Qwen2VLConfig](https://github.com/linkedin/Liger-Kernel/actions/runs/15175089532/job/42680679210?pr=689#step:6:6549), [monkey patch impl related](https://github.com/linkedin/Liger-Kernel/actions/runs/15214784023/job/42797584570#step:5:1903) Text config is seperated out from the general config in transformers>=4.52.0. [Qwen2VLRotaryEmbedding](https://github.com/huggingface/transformers/pull/37268/files#diff-09bc594f9680f1d042fd485106c68022d77b59831697a00b3b38f12a3e40f395L103-R104) takes `Qwen2VLTextConfig` intead of `Qwen2VLConfig` now....
I am using windows version of `pip install triton-windows==3.3.0.post19`
### 🚀 The feature, motivation and pitch https://huggingface.co/collections/deepseek-ai/deepseek-v3-676bc4546fb4876383c4208b ### Alternatives _No response_ ### Additional context _No response_
## Summary Try to fix https://github.com/linkedin/Liger-Kernel/issues/439. As above issue addressed, chunk hidden state across batch-dimension has restrictive benefits. Therefore I try to chunk hidden state across (batch*seq_len)-dimension. As it requires...
## Summary The default errorbar is None, which would lead to "TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'". We don't override the `errorbar` and let `seaborn` decide the...
## Summary This change is related to performance tuning on the Intel Max 1550 GPUs. By keeping the block and warp sizes the same in the forward and backward Triton...
### 🚀 The feature, motivation and pitch Upcoming refactor in transformers VLM models: https://github.com/huggingface/transformers/pull/37033 `XXXForConditionalGeneration` no longer has `language_model` attribute for `ForCausalLM`. It will be changed to `model` attribute to...
### 🚀 The feature, motivation and pitch In knowledge distillation, it has better efficiency to add support for pre-computed `logits`/`logprobs` offline in teacher model beforehand. Rather than load and forward...
## Summary implements #537 ## Details I don't know if `modeling_solar.py` and `configuration_solar.py` are in the right place. I also changed `labels is not None` to `self.training and (labels is...
### 🚀 The feature, motivation and pitch The Llama 4 models are auto-regressive language models that use a mixture-of-experts (MoE) architecture and incorporate early fusion for native multimodality. https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct ###...