Liger-Kernel
Liger-Kernel copied to clipboard
Efficient Triton Kernels for LLM Training
### 🐛 Describe the bug I am trying to instruction tuning Qwen2.5-14B-Instruct with [Liger Kernel](https://github.com/linkedin/Liger-Kernel). I know that the liger kernel is supported in the dev version of huggingface transformers....
### 🐛 Describe the bug when I load model with AutoLigerKernelForCausalLM ,I get ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) when load mdoel Apply Model-Specific...
### 🚀 The feature, motivation and pitch The official implementation of flash attention is in CUDA, so in AMD GPUs, users cannot easily use flash attention on transformers to training...
## Summary Implemented FP8 gemm with E4M3 representation for FP8. [Issue #65 ](https://github.com/linkedin/Liger-Kernel/issues/65) ## Testing Done tested square matrices of varying sizes (64, 256, 512, 1024, 2048) + non-square matrices...
### 🚀 The feature, motivation and pitch I want to utilize the liger-kernel fused operations on a codebase but do not need the requirement for transformers. However, when I import...
### 🚀 The feature, motivation and pitch I would love to see support for the Cohere models. (https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 & https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024) As far as I can tell the FusedLinearCrossEntropy kernel should...
### 🐛 Describe the bug I'm trying to test this library on an HPC cluster with AMD MI250X GPUs, but I'm getting a weird seemingly Triton-related error specifically when I...
### 🐛 Describe the bug I'm encountering a ValueError when trying to load the Qwen2-VL model using the AutoLigerKernelForCausalLM class from the Liger Kernel. The error message indicates an unrecognized...
## Summary conv2d kernel for flux + other models ## Testing Done tested for correctness with forward and backward test suite - Hardware Type: 4090 - [x] run `make test`...
Hello, thank you for this great work. https://github.com/linkedin/Liger-Kernel/blob/acd82728207ebafad28d448640502c108901a967/src/liger_kernel/ops/fused_linear_cross_entropy.py#L69 https://github.com/linkedin/Liger-Kernel/blob/acd82728207ebafad28d448640502c108901a967/src/liger_kernel/ops/fused_linear_cross_entropy.py#L91-L96 I'm wondering if there are any reasons for upcasting/downcasting the logits dtype outside the kernel? If I understand correctly, we already...