ms-swift Fine-tuning best practices for qwen2.5-72b-instruct and qwen2-vl-72b-instruct.

More docs:

qwen2-vl: https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

qwen1.5: https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

我们使用ms-swift对qwen2.5和qwen2-vl进行自我认知微调和图像OCR微调，并对微调后的模型进行推理。

在开始微调之前，请确保您的环境已正确安装

# 安装ms-swift.
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .[llm]

# qwen2-vl
# https://github.com/QwenLM/Qwen2-VL/issues/96
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
# vllm加速
pip install vllm>=0.6.1

通常，大模型微调通常使用自定义数据集进行微调。在这里，我们将展示可直接运行的demo。

qwen2.5-72b-instruct

我们对Qwen2.5-72B-Instruct进行自我认知微调。

自我认知数据集：https://www.modelscope.cn/datasets/swift/self-cognition

通用混合数据集：

https://www.modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-English
https://www.modelscope.cn/datasets/AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese

微调脚本:

# 实验环境：4 * A100
# 显存占用：4 * 70GB
NPROC_PER_NODE=4 CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type qwen2_5-72b-instruct \
    --model_id_or_path qwen/Qwen2.5-72B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant." \
    --deepspeed default-zero3

# 单卡A10/3090可运行的例子 （Qwen2.5-7B-Instruct）
# 显存占用：24GB
CUDA_VISIBLE_DEVICES=0 swift sft \
    --model_type qwen2_5-7b-instruct \
    --model_id_or_path qwen/Qwen2.5-7B-Instruct \
    --dataset qwen2-pro-en#500 qwen2-pro-zh#500 self-cognition#500 \
    --logging_steps 5 \
    --max_length 2048 \
    --learning_rate 1e-4 \
    --output_dir output \
    --lora_target_modules ALL \
    --model_name 小黄 'Xiao Huang' \
    --model_author 魔搭 ModelScope \
    --system "You are a helpful assistant."

自定义数据集文档可以查看：https://github.com/modelscope/ms-swift/blob/main/docs/source/Instruction/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md

微调显存消耗：

微调过程的loss可视化：

微调后推理脚本如下，这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速：

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2_5-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --infer_backend vllm --max_model_len 8192 \
    --tensor_parallel_size 4

微调后模型对验证集进行推理的示例：

qwen2-vl-72b-instruct

我们对Qwen2-VL-72B-Instruct进行OCR微调。Grouding任务和视频微调的例子可以查看ms-swift文档：https://github.com/modelscope/ms-swift/blob/main/docs/source/Multi-Modal/qwen2-vl%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md

微调数据集：https://modelscope.cn/datasets/AI-ModelScope/LaTeX_OCR 微调脚本：

# 实验环境：8 * A100
SIZE_FACTOR=8 MAX_PIXELS=602112 \
NPROC_PER_NODE=8 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000 \
  --deepspeed default-zero3

如果要使用自定义数据集，只需按以下方式进行指定：

# val_dataset可选，如果不指定，则会从dataset中切出一部分数据集作为验证集
  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集格式：

{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}

微调显存消耗：

微调过程的loss可视化：（由于时间原因，这里只微调了250个steps）

微调后推理脚本如下，这里的ckpt_dir需要修改为训练生成的last checkpoint文件夹。我们可以使用vLLM对merge后的checkpoint进行推理加速：

# 直接推理
CUDA_VISIBLE_DEVICES=0,1 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --load_dataset_config true

# merge-lora并使用vLLM进行推理加速
CUDA_VISIBLE_DEVICES=0,1 swift export \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx \
    --merge_lora true

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer \
    --ckpt_dir output/qwen2-vl-72b-instruct/vx-xxx/checkpoint-xxx-merged \
    --load_dataset_config true --infer_backend vllm \
    --tensor_parallel_size 4 --max_model_len 16384

微调后模型对验证集进行推理的示例：

Sep 18 '24 15:09 Jintao-Huang

qwen2-vl支持多图多伦对话训练吗？

Sep 19 '24 01:09 llp1992

can I train 72b with 2A6000? (248GB)

Sep 19 '24 04:09 etemiz

qwen2-vl支持多图多伦对话训练吗？

支持的

Sep 19 '24 15:09 Jintao-Huang

can I train 72b with 2_A6000? (2_48GB)

maybe qlora

# GPU Memory: 2 * 28GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct-gptq-int4 \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4 \
  --sft_type lora \
  --dataset latex-ocr-print#20000

Sep 19 '24 15:09 Jintao-Huang

lora & device_map

# GPU Memory: 2 * 75GB
SIZE_FACTOR=8 MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
  --model_type qwen2-vl-72b-instruct \
  --model_id_or_path qwen/Qwen2-VL-72B-Instruct \
  --sft_type lora \
  --dataset latex-ocr-print#20000

Sep 19 '24 15:09 Jintao-Huang

how to train qwen2_5-72b-instruct with 4090(24GB)*8 ？ CUDA out of memory.

Sep 25 '24 01:09 ZhuJD-China

请问在A100上训练速度怎么样

Sep 26 '24 01:09 xuezc

qwen2-vl支持多图多伦对话训练吗？

支持的

请问这里的多轮多图对话训练，每次assistant回复的内容都会参与loss计算，还是只有最后一条assistant回复的内容会参与loss计算呢

Oct 10 '24 05:10 MuyeHuang

都会计算的

Oct 10 '24 08:10 Jintao-Huang

请问支持 qwen2-vl 的 pretrain 吗

Oct 11 '24 03:10 Wangman1

读取数据后，直接停止训练，也没有报错：

Oct 18 '24 11:10 Labmem009

请问：对轮多图对话中，{"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} 中的image_path1", "image_path2"顺序是按照history中0，1，2···最后是query，这样的顺序排序的对吗？（即对话中图片出现的次序）

Oct 23 '24 05:10 MuyeHuang

能否给一个调用的示例啊？只有部署的

Oct 24 '24 09:10 dhhcj1

910b可以吗

Oct 30 '24 02:10 zhangfanTJU

按照示例，无法加载url形式的mp4文件，网络是没有问题的，本地视频为wget直接下载下来的。

<<< <video>描述视频
Input a video path or URL <<< baby.mp4
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
[INFO:swift] Setting min_frames: 4. You can adjust this hyperparameter through the environment variable: `MIN_FRAMES`.
[INFO:swift] Setting max_frames: 768. You can adjust this hyperparameter through the environment variable: `MAX_FRAMES`.
[INFO:swift] Setting min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `TOTAL_PIXELS`.
[INFO:swift] Setting max_pixels: None. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
/root/.local/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:349: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  attention_mask[..., cu_seqlens[i - 1] : cu_seqlens[i], cu_seqlens[i - 1] : cu_seqlens[i]] = True
视频中展示了一个小孩在玩书。她穿着一件浅蓝色的背心和粉色的裤子，戴着一副黑色的眼镜，坐在床上，手里拿着一本打开的书。她先是用右手翻动书页，然后用左手扶着书，右手继续翻动书页。
--------------------------------------------------
<<< <video>描述视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 280, in load_video_qwen2
    video, _, info = io.read_video(
  File "/root/.local/lib/python3.10/site-packages/torchvision/io/video.py", line 271, in read_video
    raise RuntimeError(f"File not found: {filename}")
RuntimeError: File not found: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4

而且我将文件换成我本地的其他视频文件，同样会报错。

<<< <video>描述视频
Input a video path or URL <<< test.mp4
moov atom not found
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 293, in load_video_qwen2
    nframes = video.size(0) / info['video_fps'] * fps
KeyError: 'video_fps'

Oct 30 '24 05:10 liujiachang

确实可以训练，但碰到这个问题，发现是 gradient_checkpointing设置为了true导致的，可能是代码库还没有进行修改，来匹配qwen2-vl。解决方案就是设置gradient_checkpointing=false就行了

Nov 01 '24 03:11 junwenxiong

微调VL的时候，image_path可以使用url吗

Nov 05 '24 06:11 Labmem009

大佬，请教个问题，72B-bf16的理论显存不该是72*2字节，也就是144GB吗，为啥你这里只用了70GB，是上了int4?还是有我不了解的处理机制在里面。

Nov 05 '24 08:11 fclearner

请问支持 qwen2-vl 的 pretrain 吗

我也想问，老哥找到qwen2-vl预训练的代码了吗

Nov 12 '24 09:11 bang123-box

请问多机多卡训练怎么配置DISTRIBUTED_ARGS呢？

Jan 02 '25 09:01 hj611

我想请问一下能不能设置图片为统一的形状呢？

Jan 19 '25 07:01 qlj215

我想请问一下能不能设置图片为统一的形状呢？

在微调的时候

Jan 19 '25 08:01 qlj215

@Jintao-Huang 请问这适用于qwen2.5-vl吗

Feb 05 '25 07:02 ransheng11

swift3参考这里好了

https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal

Feb 05 '25 07:02 Jintao-Huang

swift3参考这里好了

https://github.com/modelscope/ms-swift/tree/main/examples/train/multimodal

grpo lora 微调， 72b-vl 的有可以训练的通的命令行嘛？一直尝试一直显存不足。 8张 A100卡的。谢谢！

Apr 21 '25 14:04 zhangwei730

Qwen2.5-VL-72B-Instruct merge lora cuda oom，怎么处理？

Jul 15 '25 12:07 llp1992

按照示例，无法加载url形式的mp4文件，网络是没有问题的，本地视频为wget直接下载下来的。

<<< <video>描述视频
Input a video path or URL <<< baby.mp4
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
[INFO:swift] Setting min_frames: 4. You can adjust this hyperparameter through the environment variable: `MIN_FRAMES`.
[INFO:swift] Setting max_frames: 768. You can adjust this hyperparameter through the environment variable: `MAX_FRAMES`.
[INFO:swift] Setting min_pixels: 100352. You can adjust this hyperparameter through the environment variable: `MIN_PIXELS`.
[INFO:swift] Setting total_pixels: 19267584. You can adjust this hyperparameter through the environment variable: `TOTAL_PIXELS`.
[INFO:swift] Setting max_pixels: None. You can adjust this hyperparameter through the environment variable: `MAX_PIXELS`.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: `RESIZED_HEIGHT`.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: `RESIZED_WIDTH`.
[W compiler_depend.ts:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)
/root/.local/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:349: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:74.)
  attention_mask[..., cu_seqlens[i - 1] : cu_seqlens[i], cu_seqlens[i - 1] : cu_seqlens[i]] = True
视频中展示了一个小孩在玩书。她穿着一件浅蓝色的背心和粉色的裤子，戴着一副黑色的眼镜，坐在床上，手里拿着一本打开的书。她先是用右手翻动书页，然后用左手扶着书，右手继续翻动书页。
--------------------------------------------------
<<< <video>描述视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 280, in load_video_qwen2
    video, _, info = io.read_video(
  File "/root/.local/lib/python3.10/site-packages/torchvision/io/video.py", line 271, in read_video
    raise RuntimeError(f"File not found: {filename}")
RuntimeError: File not found: https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4

而且我将文件换成我本地的其他视频文件，同样会报错。

<<< <video>描述视频
Input a video path or URL <<< test.mp4
moov atom not found
[INFO:swift] Setting nframes: None. You can adjust this hyperparameter through the environment variable: `NFRAMES`.
[INFO:swift] Setting fps: None. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting size_factor: 2. You can adjust this hyperparameter through the environment variable: `SIZE_FACTOR`.
Traceback (most recent call last):
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/infer.py", line 414, in llm_infer
    for response, new_history in gen:
  File "/root/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 711, in inference_stream
    inputs, tokenizer_kwargs, token_len, example = _prepare_inputs(
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/utils.py", line 629, in _prepare_inputs
    inputs, tokenizer_kwargs = template.encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 552, in encode
    res = _encode(example)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/template.py", line 1485, in _encode
    videos = load_batch(videos, load_video_qwen2)
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 153, in load_batch
    res.append(load_func(path))
  File "/root/work/filestorage/liujc/Qwen-VL/swift/swift/llm/utils/vision_utils.py", line 293, in load_video_qwen2
    nframes = video.size(0) / info['video_fps'] * fps
KeyError: 'video_fps'

hello, do you solve this problem?

Jul 17 '25 07:07 autelcss