ms-swift llama-3.2-3b instruct doesn't stop writing

Describe the bug The model response doesn't stop. It keeps writing. I tried both swift deploy and vllm

Training arguments:

HF_HUB_ENABLE_HF_TRANSFER=1 \
USE_HF=1 \
CUDA_VISIBLE_DEVICES=0,1 \
swift rlhf \
    --rlhf_type kto \
    --model_type llama3_2-3b-instruct \
    --model_id_or_path meta-llama/Llama-3.2-3B-Instruct \
    --model_revision master \
    --beta 0.1 \
    --desirable_weight 1.0 \
    --undesirable_weight 1.0 \
    --sft_type lora \
    --use_dora True \
    --neftune_noise_alpha 5 \
    --tuner_backend peft \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir output \
    --dataset ... (kto dataset) \
    --train_dataset_sample -1 \
    --num_train_epochs 2 \
    --max_length 8192 \
    --check_dataset_strategy warning \
    --lora_rank 256 \
    --lora_alpha 512 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 2 \
    --weight_decay 0.1 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 8 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --use_flash_attn true

Your hardware and system info GPU: RTX 3090 on Runpod Torch 2.1

Additional context The SFT works no issues but i get the above issue during KTO. i tried Lllama-Factory and it works but results are not good.

Oct 02 '24 12:10 Aunali321

Add a stop token will do, this happens for small LMs.

Oct 03 '24 00:10 YerongLi

How do I add a stop token?

Oct 03 '24 06:10 Aunali321

Can I take a look at the screenshot?

Oct 03 '24 07:10 Jintao-Huang

Inference command:

USE_HF=1 \
swift infer \
	--model_type llama3_2-3b-instruct \
    --ckpt_dir '/root/llm-finetuning-setup/swift/output/llama3_2-3b-instruct/v0-20241003-081358/checkpoint-24-merged' \
    --system "You are an expert in crafting Stable Diffusion prompts. Deliver a single, refined prompt per request, focusing on vivid visual details, artistic style, and composition. Emphasize quality descriptors and avoid explicit content. Your prompt should paint a clear mental image while allowing for creative interpretation. Be creative in your word choice and phrasing to inspire unique and captivating images. Output a single line of text without quotation marks or additional commentary."

Oct 03 '24 08:10 Aunali321

DPO and SFT seems to work fine. It's the issue with KTO

Oct 03 '24 10:10 Aunali321

@Jintao-Huang i tested it alot more and it seems like the issue is with KTO itself not the model. I can reproduce this issue on gemma-2-2b, mistral-instruct-v3 (i tested this 3). All these models including llama 3.2 work fine using DPO. Only KTO functionality is broken.

Oct 14 '24 04:10 Aunali321

If you need any more info or logs please let me know so we can fix this

Oct 14 '24 04:10 Aunali321

I'll check it out, please hold on.

Oct 14 '24 04:10 Jintao-Huang

You might try lowering the lora_rank and increasing the num_train_epochs, as I see that the number of training steps is quite low.

Try removing the system during inference, or add the system during training.

Oct 14 '24 07:10 Jintao-Huang

It also happens on this dataset: https://huggingface.co/datasets/Cossale/informal-to-professional-kto using this script:

USE_HF=1 \
CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
	--rlhf_type kto \
	--model_type mistral-7b-instruct-v3 \
    --model_id_or_path mistralai/Mistral-7B-Instruct-v0.3 \
    --beta 0.1 \
    --desirable_weight 1.0 \
    --undesirable_weight 1.0 \
    --model_revision master \
    --sft_type lora \
    --tuner_backend peft \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir output \
    --dataset Cossale/informal-to-professional-kto \
    --train_dataset_sample -1 \
    --num_train_epochs 3 \
    --max_length 4096 \
    --check_dataset_strategy warning \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 1e-4 \
    --gradient_accumulation_steps 4 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --use_flash_attn true

What might the issue with this script then?

Oct 14 '24 08:10 Aunali321

I met the same issues on my custom dataset and qwen2.5vl 7b model.

Apr 28 '25 12:04 TBAsoul