Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct.
More docs:
qwen2-vl: https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md
qwen1.5: https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md
我们使用ms-swift对qwen2.5和qwen2-vl进行自我认知微调和图像OCR微调,并对微调后的模型进行推理。
在开始微调之前,请确保您的环境已正确安装
# 安装ms-swift.
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .[llm]
# qwen2-vl
# https://github.com/QwenLM/Qwen2-VL/issues/96
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
pip install vllm>=0.6.1
通常,大模型微调通常使用自定义数据集进行微调。在这里,我们将展示可直接运行的demo。
qwen2.5-72b-instruct
我们对Qwen2.5-72B-Instruct进行自我认知微调。
自我认知数据集:https://www.modelscope.cn/datasets/swift/self-cognition
通用混合数据集:
- https://www.modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English
- https://www.modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese
微调脚本:
# 实验环境:4 * A100
# 显存占用:4 * 70GB
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type qwen2_5-72b-instruct \
--model_id_or_path qwen/Qwen2.5-72B-Instruct \
--dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
--logging_steps 5 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
--system "You are a helpful assistant." \
--deepspeed default-zero3
# 单卡A10/3090可运行的例子 (Qwen2.5-7B-Instruct)
# 显存占用:24GB
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type qwen2_5-7b-instruct \
--model_id_or_path qwen/Qwen2.5-7B-Instruct \
--dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
--logging_steps 5 \
--max_length 2048 \
--learning_rate 1e-4 \
--output_dir output \
--lora_target_modules ALL \
--model_name 小黄 'Xiao Huang' \
--model_author 魔搭 ModelScope \
--system "You are a helpful assistant."
自定义数据集文档可以查看:https://github.com/modelscope/ms-swift/blob/main/docs/source/Instruction/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md
微调显存消耗:
微调过程的loss可视化:
微调后推理脚本如下,这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速:
# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
--ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \
# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
--ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
--ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx-merged \
--infer_backend vllm --max_model_len 8192 \
--tensor_parallel_size 4
微调后模型对验证集进行推理的示例:
qwen2-vl-72b-instruct
我们对Qwen2-VL-72B-Instruct进行OCR微调。Grouding任务和视频微调的例子可以查看ms-swift文档:https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md
微调数据集:https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR 微调脚本:
# 实验环境:8 * A100
SIZE_FACTOR=8 MAX_PIXELS=602112 \
NPROC_PER_NODE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
--model_type qwen2-vl-72b-instruct \
--model_id_or_path qwen/Qwen2-VL-72B-Instruct \
--sft_type lora \
--dataset latex-ocr-print#20000 \
--deepspeed default-zero3
如果要使用自定义数据集,只需按以下方式进行指定:
# val_dataset可选,如果不指定,则会从dataset中切出一部分数据集作为验证集
--dataset train.jsonl \
--val_dataset val.jsonl \
自定义数据集格式:
{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}
微调显存消耗:
微调过程的loss可视化:(由于时间原因,这里只微调了250个steps)
微调后推理脚本如下,这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速:
# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
--ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
--load_dataset_config true
# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
--ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
--merge_lora true
CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
--ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx-merged \
--load_dataset_config true --infer_backend vllm \
--tensor_parallel_size 4 --max_model_len 16384
微调后模型对验证集进行推理的示例:
qwen2-vl支持多图多伦对话训练吗?
can I train 72b with 2A6000? (248GB)
qwen2-vl支持多图多伦对话训练吗?
支持的
can I train 72b with 2_A6000? (2_48GB)
maybe qlora
# GPU Memory: 2 * 28GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
--model_type qwen2-vl-72b-instruct-gptq-int4 \
--model_id_or_path qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 \
--sft_type lora \
--dataset latex-ocr-print#20000
lora & device_map
# GPU Memory: 2 * 75GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
--model_type qwen2-vl-72b-instruct \
--model_id_or_path qwen/Qwen2-VL-72B-Instruct \
--sft_type lora \
--dataset latex-ocr-print#20000
how to train qwen2_5-72b-instruct with 4090(24GB)*8 ? CUDA out of memory.
请问在A100上训练速度怎么样
qwen2-vl支持多图多伦对话训练吗?
支持的
请问这里的多轮多图对话训练,每次assistant回复的内容都会参与loss计算,还是只有最后一条assistant回复的内容会参与loss计算呢
都会计算的
请问支持 qwen2-vl 的 pretrain 吗
读取数据后,直接停止训练,也没有报错:
请问:对轮多图对话中,{"query": "
能否给一个调用的示例啊? 只有部署的
910b可以吗
按照示例,无法加载url形式的mp4文件,网络是没有问题的,本地视频为wget直接下载下来的。
<<< <video>描述视频
Input a video path or URL <<< baby.mp4
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
[INFO:swift] Setting min_frames: 4. You can adjust this hyperparameter through the environment variable: `MIN_FRAMES`.
[INFO:swift] Setting max_frames: 768. You can adjust this hyperparameter through the environment variable: `MAX_FRAMES`.
[INFO:swift] Setting min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `TOTAL_PIXELS`.
[INFO:swift] Setting max_pixels: None. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
/root/.local/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:349: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
attention_mask[..., cu_seqlens[i - 1] : cu_seqlens[i], cu_seqlens[i - 1] : cu_seqlens[i]] = True
视频中展示了一个小孩在玩书。她穿着一件浅蓝色的背心和粉色的裤子,戴着一副黑色的眼镜,坐在床上,手里拿着一本打开的书。她先是用右手翻动书页,然后用左手扶着书,右手继续翻动书页。
--------------------------------------------------
<<< <video>描述视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
Traceback (most recent call last):
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
infer_main()
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
for response, new_history in gen:
File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
inputs, tokenizer_kwargs = template.encode(example)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
res = _encode(example)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
videos = load_batch(videos, load_video_qwen2)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
res.append(load_func(path))
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 280, in load_video_qwen2
video, _, info = io.read_video(
File "/root/.local/lib/python3.10/site-packages/torchvision/io/video.py", line 271, in read_video
raise RuntimeError(f"File not found: {filename}")
RuntimeError: File not found: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
而且我将文件换成我本地的其他视频文件,同样会报错。
<<< <video>描述视频
Input a video path or URL <<< test.mp4
moov atom not found
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
Traceback (most recent call last):
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
infer_main()
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
for response, new_history in gen:
File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
inputs, tokenizer_kwargs = template.encode(example)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
res = _encode(example)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
videos = load_batch(videos, load_video_qwen2)
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
res.append(load_func(path))
File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 293, in load_video_qwen2
nframes = video.size(0) / info['video_fps'] * fps
KeyError: 'video_fps'
确实可以训练,但碰到这个问题,发现是 gradient_checkpointing设置为了true导致的,可能是代码库还没有进行修改,来匹配qwen2-vl。 解决方案就是设置gradient_checkpointing=false就行了
微调VL的时候,image_path可以使用url吗
大佬,请教个问题,72B-bf16的理论显存不该是72*2字节,也就是144GB吗,为啥你这里只用了70GB,是上了int4?还是有我不了解的处理机制在里面。
请问支持 qwen2-vl 的 pretrain 吗
我也想问,老哥找到qwen2-vl预训练的代码了吗
请问多机多卡训练怎么配置DISTRIBUTED_ARGS呢?
我想请问一下 能不能设置图片为统一的形状呢?
我想请问一下 能不能设置图片为统一的形状呢?
在微调的时候
@Jintao-Huang 请问这适用于qwen2.5-vl吗
swift3参考这里好了
https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal
swift3参考这里好了
https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal
grpo lora 微调, 72b-vl 的有可以训练的通的 命令行嘛?一直尝试一直显存不足。 8张 A100卡的。谢谢!
Qwen2.5-VL-72B-Instruct merge lora cuda oom,怎么处理?
按照示例,无法加载url形式的mp4文件,网络是没有问题的,本地视频为wget直接下载下来的。
<<< <video>描述视频 Input a video path or URL <<< baby.mp4 [INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`. [INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`. [INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`. [INFO:swift] Setting min_frames: 4. You can adjust this hyperparameter through the environment variable: `MIN_FRAMES`. [INFO:swift] Setting max_frames: 768. You can adjust this hyperparameter through the environment variable: `MAX_FRAMES`. [INFO:swift] Setting min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`. [INFO:swift] Setting total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `TOTAL_PIXELS`. [INFO:swift] Setting max_pixels: None. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`. [INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`. [INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`. [W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) /root/.local/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:349: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.) attention_mask[..., cu_seqlens[i - 1] : cu_seqlens[i], cu_seqlens[i - 1] : cu_seqlens[i]] = True 视频中展示了一个小孩在玩书。她穿着一件浅蓝色的背心和粉色的裤子,戴着一副黑色的眼镜,坐在床上,手里拿着一本打开的书。她先是用右手翻动书页,然后用左手扶着书,右手继续翻动书页。 -------------------------------------------------- <<< <video>描述视频 Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4 Traceback (most recent call last): File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module> infer_main() File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main result = llm_x(args, **kwargs) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer for response, new_history in gen: File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream inputs, tokenizer_kwargs, token_len, example = _prepare_inputs( File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs inputs, tokenizer_kwargs = template.encode(example) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode res = _encode(example) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode videos = load_batch(videos, load_video_qwen2) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch res.append(load_func(path)) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 280, in load_video_qwen2 video, _, info = io.read_video( File "/root/.local/lib/python3.10/site-packages/torchvision/io/video.py", line 271, in read_video raise RuntimeError(f"File not found: {filename}") RuntimeError: File not found: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4而且我将文件换成我本地的其他视频文件,同样会报错。
<<< <video>描述视频 Input a video path or URL <<< test.mp4 moov atom not found [INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`. [INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`. [INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`. Traceback (most recent call last): File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module> infer_main() File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main result = llm_x(args, **kwargs) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer for response, new_history in gen: File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context response = gen.send(None) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream inputs, tokenizer_kwargs, token_len, example = _prepare_inputs( File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs inputs, tokenizer_kwargs = template.encode(example) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode res = _encode(example) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode videos = load_batch(videos, load_video_qwen2) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch res.append(load_func(path)) File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 293, in load_video_qwen2 nframes = video.size(0) / info['video_fps'] * fps KeyError: 'video_fps'
hello, do you solve this problem?