ms-swift 关于qLoRA训练

看了一下训练列表中，强化学习算法（如GRPO）是支持QLORA这些高效参数微调方法的。请问具体是怎么实现的呢，有没有训练脚本的例子可以提供一下

Apr 27 '25 04:04 Mrkkew

参考这两处的例子，https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo,，https://github.com/modelscope/ms-swift/blob/main/examples/train/qlora/bnb.sh

下面是一个qlora+grpo的例子，

CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-7B \
    --reward_funcs accuracy format \
    --train_type lora \
    --bnb_4bit_compute_dtype bfloat16 \
    --bnb_4bit_quant_type nf4 \
    --bnb_4bit_use_double_quant true \
    --quant_method bnb \
    --quant_bits 4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --torch_dtype bfloat16 \
    --dataset 'AI-MO/NuminaMath-TIR#1000' \
    --max_completion_length 1024 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 1 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 4 \
    --temperature 0.9 \
    --system 'examples/train/grpo/prompt.txt' \
    --log_completions true

Apr 27 '25 10:04 slin000111

好的，我的量化模型是awq模型，我看对应的示例脚本基本上没有加什么特殊的参数

Apr 28 '25 01:04 Mrkkew

https://github.com/modelscope/ms-swift/blob/main/examples/train/qlora/awq.sh

Apr 28 '25 01:04 Mrkkew

好的，我的量化模型是awq模型，我看对应的示例脚本基本上没有加什么特殊的参数

是的

Apr 28 '25 03:04 slin000111

是否和deepspeed zero3 冲突？我同时用qlora和zero3_offload会报错：

output tensor must have the same type as input tensor

May 07 '25 10:05 skepsun

@skepsun @Mrkkew @slin000111 I'm releasing an open-source framework By combining GRPO + QLoRA + DeepSpeed ZeRO-3,https://github.com/Minami-su/deepspeed-grpo-qlora-vllm

Jul 15 '25 06:07 Minami-su