Liger-Kernel When enabling naive model parallelism using ```device

When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.

Open Songjw133 opened this issue 8 months ago • 5 comments

trafficstars

🐛 Describe the bug

When the model is split across multiple GPUs using device_map="auto", the liger-kernel will return a ValueError.

Reproduce

os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
from transformers import AutoModelForCausalLM, set_seed
from transformers.loss.loss_utils import ForCausalLMLoss
set_seed(0)

from liger_kernel.transformers import apply_liger_kernel_to_qwen2
import torch
apply_liger_kernel_to_qwen2(
    rope=True,
    swiglu=True,
    cross_entropy=False,
    fused_linear_cross_entropy=False,
    rms_norm=True
)
model = AutoModelForCausalLM.from_pretrained("./Qwen2.5-3B-Instruct",
                                             torch_dtype=torch.bfloat16,
                                             device_map="auto")
model.train()
inputs={
    'input_ids': torch.tensor([1]).unsqueeze(0),
    'attention_mask':torch.tensor([1]).unsqueeze(0),
    'labels': torch.tensor([2]).unsqueeze(0),
}
loss =model(**inputs).loss

output:

ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)

Versions

Python version: 3.11.11 Liger Kernel version: 0.5.4 PyTorch version: 2.5.1+cu124 CUDA version: 12.4 HIP(ROCm) version: Not available Triton version: 3.1.0 Transformers version: 4.49.0 XPU version: XPU Not Available

Mar 03 '25 13:03 Songjw133

Liger-Kernel Liger-Kernel copied to clipboard

When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.

🐛 Describe the bug

Reproduce

Versions

Liger-Kernel
Liger-Kernel copied to clipboard