Liger-Kernel
Liger-Kernel copied to clipboard
Loss does not drop when using Liger Kernel at Qwen2.5
🐛 Describe the bug
I am trying to instruction tuning Qwen2.5-14B-Instruct with Liger Kernel.
I know that the liger kernel is supported in the dev version of huggingface transformers. However, when training the Qwen2.5 model with Liger Kernel, the loss value does not drop. Not supported yet at Qwen2.5?
Reproduce
Python Code Example :
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen2.5-14B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
...
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
)
trainer.train()
Run Example :
deepspeed --include localhost:0,1 --master_port 61000 train.py \
--learning_rate=1e-5 \
--lr_scheduler_type=cosine \
--max_length=8192 \
--per_device_train_batch_size=4 \
--gradient_accumulation_steps=1 \
--evaluation_strategy=no \
--num_train_epochs=3 \
--save_strategy=epoch \
--logging_strategy=steps \
--logging_steps=1 \
--save_total_limit=1 \
--remove_unused_columns=False \
--dataloader_num_workers=16 \
--warmup_ratio=0.03 \
--gradient_checkpointing=True \
--torch_compile=True \
--optim=adafactor \
--bf16 \
--deepspeed=./config/zero3.json \
--use_liger_kernel=True
Versions
Environment Report:
Operating System: Linux-5.15.0-1047-oracle-x86_64-with-glibc2.35 Python version: 3.10.14 PyTorch version: 2.4.0+cu121 CUDA version: 12.1 Triton version: 3.0.0 Transformers version: 4.45.0.dev0