Liger-Kernel
Liger-Kernel copied to clipboard
When enabling naive model parallelism using ```device_map```, the liger-kernel does not work.
trafficstars
🐛 Describe the bug
When the model is split across multiple GPUs using device_map="auto", the liger-kernel will return a ValueError.
Reproduce
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
from transformers import AutoModelForCausalLM, set_seed
from transformers.loss.loss_utils import ForCausalLMLoss
set_seed(0)
from liger_kernel.transformers import apply_liger_kernel_to_qwen2
import torch
apply_liger_kernel_to_qwen2(
rope=True,
swiglu=True,
cross_entropy=False,
fused_linear_cross_entropy=False,
rms_norm=True
)
model = AutoModelForCausalLM.from_pretrained("./Qwen2.5-3B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto")
model.train()
inputs={
'input_ids': torch.tensor([1]).unsqueeze(0),
'attention_mask':torch.tensor([1]).unsqueeze(0),
'labels': torch.tensor([2]).unsqueeze(0),
}
loss =model(**inputs).loss
output:
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
Versions
Python version: 3.11.11 Liger Kernel version: 0.5.4 PyTorch version: 2.5.1+cu124 CUDA version: 12.4 HIP(ROCm) version: Not available Triton version: 3.1.0 Transformers version: 4.49.0 XPU version: XPU Not Available