Liger-Kernel Loss does not drop when using Liger Kernel at Qwen2.5

Loss does not drop when using Liger Kernel at Qwen2.5

Open Se-Hun opened this issue 5 months ago • 4 comments

🐛 Describe the bug

I am trying to instruction tuning Qwen2.5-14B-Instruct with Liger Kernel.

I know that the liger kernel is supported in the dev version of huggingface transformers. However, when training the Qwen2.5 model with Liger Kernel, the loss value does not drop. Not supported yet at Qwen2.5?

Reproduce

Python Code Example :

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-14B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

...

trainer = Trainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
)
trainer.train()

Run Example :

deepspeed --include localhost:0,1 --master_port 61000 train.py \
    --learning_rate=1e-5 \
    --lr_scheduler_type=cosine \
    --max_length=8192 \
    --per_device_train_batch_size=4 \
    --gradient_accumulation_steps=1 \
    --evaluation_strategy=no \
    --num_train_epochs=3 \
    --save_strategy=epoch \
    --logging_strategy=steps \
    --logging_steps=1 \
    --save_total_limit=1 \
    --remove_unused_columns=False \
    --dataloader_num_workers=16 \
    --warmup_ratio=0.03 \
    --gradient_checkpointing=True \
    --torch_compile=True \
    --optim=adafactor \
    --bf16 \
    --deepspeed=./config/zero3.json \
    --use_liger_kernel=True

Versions

Environment Report:

Operating System: Linux-5.15.0-1047-oracle-x86_64-with-glibc2.35 Python version: 3.10.14 PyTorch version: 2.4.0+cu121 CUDA version: 12.1 Triton version: 3.0.0 Transformers version: 4.45.0.dev0

Sep 19 '24 08:09 Se-Hun

Liger-Kernel Liger-Kernel copied to clipboard

Loss does not drop when using Liger Kernel at Qwen2.5

🐛 Describe the bug

Reproduce

Versions

Environment Report:

Liger-Kernel
Liger-Kernel copied to clipboard