Liger-Kernel
Liger-Kernel copied to clipboard
[bug] deepspeed zero++ multinode with liger kernel
🐛 Describe the bug
deepspeed zero++ config
- I ran the training with slrum
{
"zero_optimization": {
"stage": 3,
"stage3_gather_16bit_weights_on_model_save": true,
"reduce_bucket_size": "auto",
"zero_hpz_partition_size": 8,
"zero_quantized_weights": true,
"zero_quantized_gradients": true,
"contiguous_gradients": true,
"overlap_comm":true
},
"bf16": {
"enabled": true
},
"gradient_accumulation_steps": "auto",
"train_micro_batch_size_per_gpu": "auto"
}
error message
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != c10::BFloat16
File "/usr/local/lib/python3.10/dist-packages/liger_kernel/ops/fused_linear_cross_entropy.py", line 77, in fused_linear_cross_entropy_forward
logits_chunk = _input_chunk @ weight.t() # chunk_size x V
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/liger_kernel/transformers/fused_linear_cross_entropy.py", line 38, in forward
Reproduce
Sorry I can't release the code.
Huggingface Trainer with llama model could reproduce the message
Versions
Operating System: Linux-5.15.0-60-generic-x86_64-with-glibc2.35 Python version: 3.10.12 Liger Kernel version: 0.5.4 PyTorch version: 2.2.0a0+81ea7a4 CUDA version: 12.3 HIP(ROCm) version: Not available Triton version: 3.0.0 Transformers version: 4.49.0 Environment Report: