llama-3.2-3b instruct doesn't stop writing
Describe the bug
The model response doesn't stop. It keeps writing. I tried both swift deploy and vllm
Training arguments:
HF_HUB_ENABLE_HF_TRANSFER=1 \
USE_HF=1 \
CUDA_VISIBLE_DEVICES=0,1 \
swift rlhf \
--rlhf_type kto \
--model_type llama3_2-3b-instruct \
--model_id_or_path meta-llama/Llama-3.2-3B-Instruct \
--model_revision master \
--beta 0.1 \
--desirable_weight 1.0 \
--undesirable_weight 1.0 \
--sft_type lora \
--use_dora True \
--neftune_noise_alpha 5 \
--tuner_backend peft \
--template_type AUTO \
--dtype AUTO \
--output_dir output \
--dataset ... (kto dataset) \
--train_dataset_sample -1 \
--num_train_epochs 2 \
--max_length 8192 \
--check_dataset_strategy warning \
--lora_rank 256 \
--lora_alpha 512 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 2 \
--weight_decay 0.1 \
--learning_rate 1e-4 \
--gradient_accumulation_steps 8 \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--eval_steps 100 \
--save_steps 100 \
--save_total_limit 2 \
--logging_steps 10 \
--use_flash_attn true
Your hardware and system info GPU: RTX 3090 on Runpod Torch 2.1
Additional context The SFT works no issues but i get the above issue during KTO. i tried Lllama-Factory and it works but results are not good.
Add a stop token will do, this happens for small LMs.
How do I add a stop token?
Can I take a look at the screenshot?
Inference command:
USE_HF=1 \
swift infer \
--model_type llama3_2-3b-instruct \
--ckpt_dir '/root/llm-finetuning-setup/swift/output/llama3_2-3b-instruct/v0-20241003-081358/checkpoint-24-merged' \
--system "You are an expert in crafting Stable Diffusion prompts. Deliver a single, refined prompt per request, focusing on vivid visual details, artistic style, and composition. Emphasize quality descriptors and avoid explicit content. Your prompt should paint a clear mental image while allowing for creative interpretation. Be creative in your word choice and phrasing to inspire unique and captivating images. Output a single line of text without quotation marks or additional commentary."
DPO and SFT seems to work fine. It's the issue with KTO
@Jintao-Huang i tested it alot more and it seems like the issue is with KTO itself not the model. I can reproduce this issue on gemma-2-2b, mistral-instruct-v3 (i tested this 3). All these models including llama 3.2 work fine using DPO. Only KTO functionality is broken.
If you need any more info or logs please let me know so we can fix this
I'll check it out, please hold on.
You might try lowering the lora_rank and increasing the num_train_epochs, as I see that the number of training steps is quite low.
Try removing the system during inference, or add the system during training.
It also happens on this dataset: https://huggingface.co/datasets/Cossale/informal-to-professional-kto using this script:
USE_HF=1 \
CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
--rlhf_type kto \
--model_type mistral-7b-instruct-v3 \
--model_id_or_path mistralai/Mistral-7B-Instruct-v0.3 \
--beta 0.1 \
--desirable_weight 1.0 \
--undesirable_weight 1.0 \
--model_revision master \
--sft_type lora \
--tuner_backend peft \
--template_type AUTO \
--dtype AUTO \
--output_dir output \
--dataset Cossale/informal-to-professional-kto \
--train_dataset_sample -1 \
--num_train_epochs 3 \
--max_length 4096 \
--check_dataset_strategy warning \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout_p 0.05 \
--lora_target_modules ALL \
--gradient_checkpointing true \
--batch_size 1 \
--weight_decay 0.1 \
--learning_rate 1e-4 \
--gradient_accumulation_steps 4 \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--eval_steps 100 \
--save_steps 100 \
--save_total_limit 2 \
--logging_steps 10 \
--use_flash_attn true
What might the issue with this script then?
I met the same issues on my custom dataset and qwen2.5vl 7b model.