Liger-Kernel issues

Loss does not drop when using Liger Kernel at Qwen2.5

14

### 🐛 Describe the bug I am trying to instruction tuning Qwen2.5-14B-Instruct with [Liger Kernel](https://github.com/linkedin/Liger-Kernel). I know that the liger kernel is supported in the dev version of huggingface transformers....

Se-Hun

inference qwen2 model ,The reasoning is garbled and ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

1

### 🐛 Describe the bug when I load model with AutoLigerKernelForCausalLM ,I get ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) when load mdoel Apply Model-Specific...

TulipDu

[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs.

4

### 🚀 The feature, motivation and pitch The official implementation of flash attention is in CUDA, so in AMD GPUs, users cannot easily use flash attention on transformers to training...

ByronHsu

feature

AMD

gemm fp8 e4m3

3

## Summary Implemented FP8 gemm with E4M3 representation for FP8. [Issue #65 ](https://github.com/linkedin/Liger-Kernel/issues/65) ## Testing Done tested square matrices of varying sizes (64, 256, 512, 1024, 2048) + non-square matrices...

AndreSlavescu

Optional dependency on transformers

5

### 🚀 The feature, motivation and pitch I want to utilize the liger-kernel fused operations on a codebase but do not need the requirement for transformers. However, when I import...

DuarteMRAlves

Support for Cohere models

1

### 🚀 The feature, motivation and pitch I would love to see support for the Cohere models. (https://huggingface.co/CohereForAI/c4ai-command-r-08-2024 & https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024) As far as I can tell the FusedLinearCrossEntropy kernel should...

nyxkrage

Triton error on AMD GPUs

8

### 🐛 Describe the bug I'm trying to test this library on an HPC cluster with AMD MI250X GPUs, but I'm getting a weird seemingly Triton-related error specifically when I...

eminorhan

ValueError when Loading Qwen2-VL Model with Liger Kernel

1

### 🐛 Describe the bug I'm encountering a ValueError when trying to load the Qwen2-VL model using the AutoLigerKernelForCausalLM class from the Liger Kernel. The error message indicates an unrecognized...

rahatarinasir

[Operator] conv2d

1

## Summary conv2d kernel for flux + other models ## Testing Done tested for correctness with forward and backward test suite - Hardware Type: 4090 - [x] run `make test`...

AndreSlavescu

Reasons for upcasting the logits dtype outside the kernel

7

Hello, thank you for this great work. https://github.com/linkedin/Liger-Kernel/blob/acd82728207ebafad28d448640502c108901a967/src/liger_kernel/ops/fused_linear_cross_entropy.py#L69 https://github.com/linkedin/Liger-Kernel/blob/acd82728207ebafad28d448640502c108901a967/src/liger_kernel/ops/fused_linear_cross_entropy.py#L91-L96 I'm wondering if there are any reasons for upcasting/downcasting the logits dtype outside the kernel? If I understand correctly, we already...

yzhangcs

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

Loss does not drop when using Liger Kernel at Qwen2.5

inference qwen2 model ,The reasoning is garbled and ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs.

gemm fp8 e4m3

Optional dependency on transformers

Support for Cohere models

Triton error on AMD GPUs

ValueError when Loading Qwen2-VL Model with Liger Kernel

[Operator] conv2d

Reasons for upcasting the logits dtype outside the kernel

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard