LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

Qwen3-VL-8B和4B训练时候的GPU使用率低下

Open Formula24Code opened this issue 2 months ago • 33 comments

Reminder

  • [x] I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.4.dev0
  • Platform: Linux-6.14.0-28-generic-x86_64-with-glibc2.39
  • Python version: 3.11.13
  • PyTorch version: 2.9.0-rc9 (GPU)
  • Transformers version: 4.57.1
  • Datasets version: 4.0.0
  • Accelerate version: 1.10.1
  • PEFT version: 0.17.1
  • GPU type: NVIDIA GeForce RTX 5090 D
  • GPU number: 2
  • GPU memory: 31.36GB
  • TRL version: 0.9.6
  • DeepSpeed version: 0.16.9
  • Bitsandbytes version: 0.48.1
  • Default data directory: detected

使用的llamafactory已编译commit 1037f63

Reproduction

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /app/models/Qwen3-VL-8B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen3_vl_nothink \
    --flash_attn auto \
    --dataset_dir data \
    --dataset mllm_demo2 \
    --cutoff_len 10240 \
    --learning_rate 5e-05 \
    --num_train_epochs 100.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --packing False \
    --enable_thinking False \
    --report_to none \
    --output_dir saves/Qwen3-VL-8B-Instruct/lora/train_2025-10-16-10-19-04 \
    --bf16 True \
    --plot_loss True \
    --trust_remote_code True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --optim adamw_torch \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --freeze_vision_tower False \
    --freeze_multi_modal_projector False \
    --image_max_pixels 960400 \
    --image_min_pixels 1024 \
    --video_max_pixels 65536 \
    --video_min_pixels 256

两张卡在训练Qwen3-VL的8B或4B模型的时候,GPU使用率低下(但不报错,能正常训练)

Image

Others

一样的环境,使用以下命令(几乎一样)训练Qwen2.5-VL-7B,GPU使用率正常的:

llamafactory-cli train \
    --stage sft \
    --do_train True \
    --model_name_or_path /app/models/Qwen2.5-VL-7B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen2_vl \
    --flash_attn auto \
    --dataset_dir data \
    --dataset mllm_demo2 \
    --cutoff_len 10240 \
    --learning_rate 5e-05 \
    --num_train_epochs 100.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --packing False \
    --enable_thinking False \
    --report_to none \
    --output_dir saves/Qwen2.5-VL-7B-Instruct/lora/train_2025-10-16-10-31-04 \
    --bf16 True \
    --plot_loss True \
    --trust_remote_code True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --optim adamw_torch \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --freeze_vision_tower False \
    --freeze_multi_modal_projector False \
    --image_max_pixels 960400 \
    --image_min_pixels 1024 \
    --video_max_pixels 65536 \
    --video_min_pixels 256
Image

Formula24Code avatar Oct 16 '25 02:10 Formula24Code

同样问题

zzb213213 avatar Oct 16 '25 06:10 zzb213213

老哥你的训练时间正常吗?我在相同的设置下sft qwen3vl时间几乎是qwen2.5vl的几倍

fishfuck avatar Oct 18 '25 03:10 fishfuck

可能环境问题,4090没有这个问题,跑满了

Image

shidhi771 avatar Oct 19 '25 05:10 shidhi771

Image同样存在类似问题,

Image

训练时长增加2-3倍

mengzmd avatar Oct 20 '25 02:10 mengzmd

我也存在同样问题,相同数据集和配置,qwen3vl-8b 全参sft时长比qwen2.5vl-7b增加3倍以上

Image

JunchenHuang777 avatar Oct 20 '25 07:10 JunchenHuang777

@mengzmd @JunchenHuang777 老哥有跑完最终的结果吗 有掉点吗 我的任务切换qwen3VL之后评测指标还掉了接近4个点 :(

fishfuck avatar Oct 20 '25 07:10 fishfuck

大家可以在这里分享一下自己的硬件环境,环境设置以及数据集特点 看看是哪里有问题, 在我们自己的测试过程中没有发现特别严重的速度问题

Kuangdd01 avatar Oct 20 '25 09:10 Kuangdd01

System Info gpu:2×A800 cuda:11.8 python3.10 Package Version


absl-py 1.3.0 annoy 1.17.3 apex 0.1 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 args 0.1.0 asttokens 2.2.1 astunparse 1.6.3 atari-py 0.2.9 attrs 22.1.0 audioread 3.0.0 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 5.0.1 blis 0.7.9 box2d-py 2.3.8 cachetools 5.2.0 catalogue 2.0.8 certifi 2022.12.7 cffi 1.15.1 chardet 3.0.4 charset-normalizer 2.1.1 click 8.1.3 clint 0.5.1 cloudpickle 2.2.0 cmake 3.24.1.1 comm 0.2.1 confection 0.0.3 contourpy 1.0.6 cuda-python 11.7.0+0.g95a2041.dirty cudf 22.10.0a0+316.gad1ba132d2.dirty cugraph 22.10.0a0+113.g6bbdadf8.dirty cuml 22.10.0a0+56.g3a8dea659.dirty cupy-cuda111 12.3.0 cupy-cuda118 11.0.0 cycler 0.11.0 cymem 2.0.7 Cython 0.29.32 cytoolz 0.12.2 dask 2022.9.2 dask-cuda 22.10.0a0+23.g62a1ee8 dask-cudf 22.10.0a0+316.gad1ba132d2.dirty dbus-python 1.2.16 debugpy 1.6.4 decorator 5.1.1 defusedxml 0.7.1 distlib 0.3.8 distributed 2022.9.2 distro 1.4.0 dlib 19.24.2 entrypoints 0.4 exceptiongroup 1.0.4 execnet 1.9.0 executing 1.2.0 expecttest 0.1.3 fastjsonschema 2.16.2 fastrlock 0.8.1 filelock 3.13.1 fonttools 4.38.0 fsspec 2022.11.0 funcsigs 1.0.2 google-auth 2.15.0 google-auth-oauthlib 0.4.6 gpg 1.13.1 graphsurgeon 0.4.6 grpcio 1.51.1 gym 0.26.2 gym-notices 0.0.8 gym-retro 0.8.0 HeapDict 1.0.1 hypothesis 5.35.1 idna 3.4 imagecodecs 2023.3.16 importlib-metadata 5.1.0 importlib-resources 5.10.1 incremental 22.10.0 iniconfig 1.1.1 intel-openmp 2021.4.0 ipykernel 6.19.2 ipython 8.7.0 ipython-genutils 0.2.0 ipywidgets 8.1.1 jedi 0.18.2 jellyfish 1.0.3 Jinja2 3.1.2 joblib 1.2.0 json5 0.9.10 jsonschema 4.17.3 jupyter_client 7.4.8 jupyter_core 5.1.0 jupyter-tensorboard 0.2.0 jupyterlab 2.3.2 jupyterlab-pygments 0.2.2 jupyterlab-server 1.2.0 jupyterlab-widgets 3.0.9 jupytext 1.14.4 kaggle 1.5.16 kiwisolver 1.4.4 langcodes 3.3.0 librosa 0.9.2 llvmlite 0.39.1 locket 1.0.0 Markdown 3.4.1 markdown-it-py 2.1.0 MarkupSafe 2.1.1 marshmallow 3.20.1 matplotlib 3.6.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.3 mdurl 0.1.2 menpo 0.11.0 mistune 2.0.4 mkl 2021.1.1 mkl-devel 2021.1.1 mkl-include 2021.1.1 mock 4.0.3 mpmath 1.2.1 msgpack 1.0.4 murmurhash 1.0.9 nbclient 0.7.2 nbconvert 7.2.6 nbformat 5.7.0 nest-asyncio 1.5.6 networkx 2.6.3 notebook 6.4.10 numba 0.56.4 numpy 1.22.2 nvidia-dali-cuda110 1.20.0 nvidia-pyindex 1.0.9 nvtx 0.2.5 oauthlib 3.2.2 onnx 1.12.0 opencv 4.6.0 packaging 22.0 pandas 1.5.2 pandocfilters 1.5.0 parso 0.8.3 partd 1.3.0 path 16.9.0 path.py 12.5.0 pathlib2 2.3.7.post1 pathy 0.10.1 pbr 6.0.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.2.0 pip 21.2.4 pkgutil_resolve_name 1.3.10 platformdirs 4.1.0 plotly 5.18.0 pluggy 1.0.0 polygraphy 0.43.1 pooch 1.6.0 preshed 3.0.8 prettytable 3.5.0 prometheus-client 0.15.0 prompt-toolkit 3.0.36 protobuf 3.20.1 psutil 5.9.4 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 9.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.10.1 pycocotools 2.0+nv0.7.1 pycparser 2.21 pycrypto 2.6.1 pydantic 1.10.2 pyemd 1.0.0 pyglet 1.5.28 Pygments 2.13.0 PyGObject 3.36.0 pylibcugraph 22.10.0a0+113.g6bbdadf8.dirty pylibraft 22.10.0a0+81.g08abc72.dirty pynvml 11.4.1 pynvrtc 9.2 PyOpenGL 3.1.7 PyOpenGL-accelerate 3.1.7 pyparsing 3.0.9 pyphen 0.14.0 pyrsistent 0.19.2 pytest 7.2.0 pytest-rerunfailures 10.3 pytest-shard 0.1.2 pytest-xdist 3.1.0 python-dateutil 2.8.2 python-hostlist 1.22 python-slugify 8.0.1 pytorch-quantization 2.1.2 pytz 2022.6 PyYAML 6.0 pyzmq 24.0.1 qtconsole 5.5.1 QtPy 2.4.1 raft-dask 22.10.0a0+81.g08abc72.dirty raven 6.10.0 regex 2022.10.31 requests 2.28.1 requests-oauthlib 1.3.1 requests-toolbelt 1.0.0 resampy 0.4.2 retrowrapper 0.3.0 rmm 22.10.0a0+38.ge043158.dirty rsa 4.9 scikit-learn 0.24.2 scipy 1.6.3 seaborn 0.13.1 Send2Trash 1.8.0 setuptools 59.5.0 six 1.16.0 smart-open 6.3.0 sortedcontainers 2.4.0 soundfile 0.11.0 soupsieve 2.3.2.post1 spacy 3.4.4 spacy-legacy 3.0.10 spacy-loggers 1.0.4 sphinx-glpi-theme 0.3 srsly 2.4.5 ssh-import-id 5.10 stack-data 0.6.2 sympy 1.11.1 tbb 2021.7.1 tblib 1.7.0 tenacity 8.2.3 tensorboard 2.9.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorrt 8.5.1.7 terminado 0.17.1 text-unidecode 1.3 thinc 8.1.5 threadpoolctl 3.1.0 tifffile 2023.7.10 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 1.14.0a0+410ce96 torch-tensorrt 1.3.0a0 torchtext 0.13.0a0+fae8e8c torchvision 0.15.0a0 tornado 6.4 tqdm 4.64.1 traitlets 5.7.1 transformer-engine 0.3.0 treelite 2.4.0 treelite-runtime 2.4.0 typer 0.7.0 typing_extensions 4.4.0 ucx-py 0.27.0a0+29.ge9e81f8 uff 0.6.9 urllib3 1.26.13 virtualenv 20.25.0 visdom 0.2.4 wasabi 0.10.1 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.7.0 Werkzeug 2.2.2 wheel 0.38.4 widgetsnbextension 4.0.9 xdoctest 1.0.2 xgboost 1.6.2 zict 2.2.0 zipp 3.11.0 zmq 0.0.0 训练参数

model

model_name_or_path: /dfs/data/qwen3_vl_8b image_max_pixels: 1048576

method

stage: sft do_train: true finetuning_type: full freeze_vision_tower: false freeze_multi_modal_projector: false freeze_language_model: false deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

dataset

dataset: lingbujian_fuza_prompt template: qwen3_vl cutoff_len: 2048 max_samples: 30000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/qwen3vl/full/sft save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-5 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 2000 eval_dataset: lingbujian_val compute_accuracy: true 训练参数和qwen2.5_vl一致, 数据集任务为对图像分类,答案为类别,训练数据2.4W,验证数据0.7W, 同样数据和训练参数在使用qwen2.5_vl全参微调时大约为160h

mengzmd avatar Oct 20 '25 09:10 mengzmd

@Kuangdd01 Qwen3-VL中控制图片大小的参数好像和Qwen2.5-VL中的不同,由IMAGE_MAX_PIXELS变为IMAGE_MAX_TOKEN_NUM,请看https://github.com/QwenLM/Qwen3-VL/blob/6d08b04928bd3914b353f833dfe71de83989dfb9/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L26-L29,请问对这个适配了吗?我初步怀疑是这个的问题

fishfuck avatar Oct 20 '25 11:10 fishfuck

System Info gpu:8×H100 cuda:12.3 python3.10

Package Version

accelerate 1.10.1 aiofiles 24.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.13.1 aiosignal 1.4.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.11.0 async-timeout 5.0.1 attrs 25.4.0 audioread 3.0.1 av 16.0.1 Brotli 1.1.0 certifi 2025.10.5 cffi 2.0.0 charset-normalizer 3.4.4 click 8.3.0 contourpy 1.3.2 cycler 0.12.1 datasets 4.0.0 decorator 5.2.1 deepspeed 0.16.9 dill 0.3.8 docstring_parser 0.17.0 einops 0.8.1 exceptiongroup 1.3.0 fastapi 0.119.0 ffmpy 0.6.3 filelock 3.20.0 fire 0.7.1 fonttools 4.60.1 frozenlist 1.8.0 fsspec 2025.3.0 gradio 5.45.0 gradio_client 1.13.0 groovy 0.1.2 h11 0.16.0 hf_transfer 0.1.9 hf-xet 1.1.10 hjson 3.1.0 httpcore 1.0.9 httpx 0.28.1 huggingface-hub 0.35.3 idna 3.11 jieba 0.42.1 Jinja2 3.1.6 joblib 1.5.2 kiwisolver 1.4.9 lazy_loader 0.4 librosa 0.11.0 llamafactory 0.9.4.dev0 llvmlite 0.45.1 markdown-it-py 4.0.0 MarkupSafe 3.0.3 matplotlib 3.10.7 mdurl 0.1.2 modelscope 1.31.0 mpmath 1.3.0 msgpack 1.1.2 multidict 6.7.0 multiprocess 0.70.16 networkx 3.4.2 ninja 1.13.0 nltk 3.9.2 numba 0.62.1 numpy 1.26.4 nvidia-cublas-cu12 12.8.4.1 nvidia-cuda-cupti-cu12 12.8.90 nvidia-cuda-nvrtc-cu12 12.8.93 nvidia-cuda-runtime-cu12 12.8.90 nvidia-cudnn-cu12 9.10.2.21 nvidia-cufft-cu12 11.3.3.83 nvidia-cufile-cu12 1.13.1.3 nvidia-curand-cu12 10.3.9.90 nvidia-cusolver-cu12 11.7.3.90 nvidia-cusparse-cu12 12.5.8.93 nvidia-cusparselt-cu12 0.7.1 nvidia-ml-py 13.580.82 nvidia-nccl-cu12 2.27.5 nvidia-nvjitlink-cu12 12.8.93 nvidia-nvshmem-cu12 3.3.20 nvidia-nvtx-cu12 12.8.90 omegaconf 2.3.0 orjson 3.11.3 packaging 25.0 pandas 2.3.3 peft 0.17.1 pillow 11.3.0 pip 24.0 platformdirs 4.5.0 pooch 1.8.2 propcache 0.4.1 protobuf 6.33.0 psutil 7.1.0 py-cpuinfo 9.0.0 pyarrow 21.0.0 pycparser 2.23 pydantic 2.10.6 pydantic_core 2.27.2 pydub 0.25.1 Pygments 2.19.2 pyparsing 3.2.5 python-dateutil 2.9.0.post0 python-multipart 0.0.20 pytz 2025.2 PyYAML 6.0.3 regex 2025.9.18 requests 2.32.5 rich 14.2.0 rouge-chinese 1.0.3 ruff 0.14.1 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.2 scipy 1.15.3 semantic-version 2.10.0 sentencepiece 0.2.1 setuptools 69.1.0 shellingham 1.5.4 shtab 1.7.2 six 1.17.0 sniffio 1.3.1 soundfile 0.13.1 soxr 1.0.0 sse-starlette 3.0.2 starlette 0.48.0 sympy 1.14.0 termcolor 3.1.0 threadpoolctl 3.6.0 tiktoken 0.12.0 tokenizers 0.22.1 tomlkit 0.13.3 torch 2.9.0 torchvision 0.24.0 tqdm 4.67.1 transformers 4.57.1 triton 3.5.0 trl 0.9.6 typer 0.19.2 typing_extensions 4.15.0 tyro 0.8.14 tzdata 2025.2 urllib3 2.5.0 uvicorn 0.38.0 websockets 15.0.1 wheel 0.42.0 xxhash 3.6.0 yarl 1.22.0

训练参数

model

model_name_or_path: ./qwen3-vl-8b image_max_pixels: 262144 video_max_pixels: 16384 trust_remote_code: true

method

stage: sft do_train: true finetuning_type: full freeze_vision_tower: true freeze_multi_modal_projector: true freeze_language_model: false deepspeed: examples/deepspeed/ds_z3_config.json

dataset

dataset: video_test template: qwen3_vl cutoff_len: 4096 overwrite_cache: false preprocessing_num_workers: 8 dataloader_num_workers: 8 tokenized_path: ./cache/video_test_tokenized

output

output_dir: ./sft_test logging_steps: 10 save_strategy: epoch plot_loss: true overwrite_output_dir: true save_only_model: false report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]

train

per_device_train_batch_size: 4 gradient_accumulation_steps: 1 learning_rate: 1.0e-5 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 resume_from_checkpoint: null

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: epoch

数据集都是视频,有5万2条数据,同样的数据及配置下 qwen3vl-8b 训练(除视频预处理及编码阶段) 119h,2.5vl-7b大概16h

JunchenHuang777 avatar Oct 20 '25 11:10 JunchenHuang777

System Info gpu: 8xH20 cuda 12.2 python 3.10

pip包: Package Version Editable project location


accelerate 1.10.1 aiofiles 24.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.13.0 aiosignal 1.4.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.11.0 async-timeout 5.0.1 attrs 25.4.0 audioread 3.0.1 av 16.0.1 boto3 1.40.55 botocore 1.40.55 Brotli 1.1.0 certifi 2025.10.5 cffi 2.0.0 charset-normalizer 3.4.4 click 8.3.0 contourpy 1.3.2 cycler 0.12.1 datasets 4.0.0 decorator 5.2.1 dill 0.3.8 docstring_parser 0.17.0 einops 0.8.1 exceptiongroup 1.3.0 fastapi 0.119.0 ffmpy 0.6.3 filelock 3.20.0 fire 0.7.1 fonttools 4.60.1 frozenlist 1.8.0 fsspec 2025.3.0 gradio 5.45.0 gradio_client 1.13.0 groovy 0.1.2 h11 0.16.0 hf_transfer 0.1.9 hf-xet 1.1.10 httpcore 1.0.9 httpx 0.28.1 huggingface-hub 0.35.3 idna 3.11 jieba 0.42.1 Jinja2 3.1.6 jmespath 1.0.1 joblib 1.5.2 kiwisolver 1.4.9 lazy_loader 0.4 librosa 0.11.0 llamafactory 0.9.4.dev0 /home/work/workspace/llama_venv/LLaMA-Factory llvmlite 0.45.1 markdown-it-py 4.0.0 MarkupSafe 3.0.3 matplotlib 3.10.7 mdurl 0.1.2 modelscope 1.31.0 mpmath 1.3.0 msgpack 1.1.2 multidict 6.7.0 multiprocess 0.70.16 networkx 3.4.2 nltk 3.9.2 numba 0.62.1 numpy 1.26.4 nvidia-cublas-cu12 12.8.4.1 nvidia-cuda-cupti-cu12 12.8.90 nvidia-cuda-nvrtc-cu12 12.8.93 nvidia-cuda-runtime-cu12 12.8.90 nvidia-cudnn-cu12 9.10.2.21 nvidia-cufft-cu12 11.3.3.83 nvidia-cufile-cu12 1.13.1.3 nvidia-curand-cu12 10.3.9.90 nvidia-cusolver-cu12 11.7.3.90 nvidia-cusparse-cu12 12.5.8.93 nvidia-cusparselt-cu12 0.7.1 nvidia-ml-py 13.580.82 nvidia-nccl-cu12 2.27.5 nvidia-nvjitlink-cu12 12.8.93 nvidia-nvshmem-cu12 3.3.20 nvidia-nvtx-cu12 12.8.90 omegaconf 2.3.0 orjson 3.11.3 packaging 25.0 pandas 2.3.3 peft 0.17.1 pillow 11.3.0 pip 25.2 platformdirs 4.5.0 pooch 1.8.2 prettytable 3.16.0 propcache 0.4.1 protobuf 6.33.0 psutil 7.1.0 pyarrow 21.0.0 pycparser 2.23 pydantic 2.10.6 pydantic_core 2.27.2 pydub 0.25.1 pyecharts 2.0.9 Pygments 2.19.2 pyparsing 3.2.5 python-dateutil 2.9.0.post0 python-multipart 0.0.20 pytz 2025.2 PyYAML 6.0.3 regex 2025.9.18 requests 2.32.5 rich 13.9.4 rouge-chinese 1.0.3 ruff 0.14.1 s3transfer 0.14.0 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.2 scipy 1.15.3 semantic-version 2.10.0 sentencepiece 0.2.1 setuptools 80.9.0 shellingham 1.5.4 shtab 1.7.2 simplejson 3.20.2 six 1.17.0 sniffio 1.3.1 soundfile 0.13.1 soxr 1.0.0 sse-starlette 3.0.2 starlette 0.48.0 swanlab 0.6.12 sympy 1.14.0 termcolor 3.1.0 threadpoolctl 3.6.0 tiktoken 0.12.0 tokenizers 0.22.1 tomlkit 0.13.3 torch 2.9.0 torchvision 0.24.0 tqdm 4.67.1 transformers 4.57.1 triton 3.5.0 trl 0.9.6 typer 0.19.2 typing_extensions 4.15.0 tyro 0.8.14 tzdata 2025.2 urllib3 2.5.0 uvicorn 0.37.0 wcwidth 0.2.14 websockets 15.0.1 wrapt 2.0.0 xxhash 3.6.0 yarl 1.22.0

训练参数: model_name_or_path: /home/work/bos/Qwen3-VL-8B-Instruct image_max_pixels: 853000 # 1000 × 853 video_max_pixels: 16384

stage: sft do_train: true finetuning_type: lora lora_rank: 8 lora_target: all

dataset: poi_scene_pano_v1_train # video: mllm_video_demo template: qwen3_vl cutoff_len: 4096 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16 dataloader_num_workers: 4

eval_dataset: poi_scene_pano_v1_val

do_eval: true per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 100 predict_with_generate: true

output_dir: saves/qwen3_vl-8b/lora/sft_v1 logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true save_only_model: false

per_device_train_batch_size: 4 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 10.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 resume_from_checkpoint: null

use_swanlab: true swanlab_project: llamafactory_qwen3vl_8b swanlab_run_name: poi_scene_pano_lora_v1

训练时间:1w样本 79小时 2.5vl-7b 大约2h

Naieo avatar Oct 20 '25 12:10 Naieo

一样的环境和启动方法,为什么我的会出现这个错误,你们当时出现过吗,是怎么解决的?方便帮忙看一下吗 ValueError: Processor was not found, please check and update your model file.

kanqgg avatar Oct 22 '25 02:10 kanqgg

系统: 4卡4090 24G NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 python 3.11 环境: absl-py==2.3.1 accelerate==1.11.0 aiofiles==24.1.0 aiohappyeyeballs==2.6.1 aiohttp==3.13.1 aiosignal==1.4.0 annotated-types==0.7.0 antlr4-python3-runtime==4.9.3 anyio==4.11.0 attrs==25.4.0 audioread==3.0.1 av==16.0.1 Brotli==1.1.0 certifi==2025.10.5 cffi==2.0.0 charset-normalizer==3.4.4 click==8.3.0 contourpy==1.3.3 cycler==0.12.1 datasets==4.0.0 decorator==5.2.1 deepspeed==0.16.9 dill==0.3.8 docstring_parser==0.17.0 einops==0.8.1 fastapi==0.119.1 ffmpy==0.6.3 filelock==3.20.0 fire==0.7.1 flash_attn==2.8.3 fonttools==4.60.1 frozenlist==1.8.0 fsspec==2025.3.0 gradio==5.45.0 gradio_client==1.13.0 groovy==0.1.2 grpcio==1.76.0 h11==0.16.0 hf-xet==1.1.10 hf_transfer==0.1.9 hjson==3.1.0 httpcore==1.0.9 httpx==0.28.1 huggingface-hub==0.35.3 idna==3.11 jieba==0.42.1 Jinja2==3.1.6 joblib==1.5.2 kiwisolver==1.4.9 lazy_loader==0.4 librosa==0.11.0 Editable install with no version control (llamafactory==0.9.4.dev0) -e /path_to/LLaMA-Factory llvmlite==0.45.1 Markdown==3.9 markdown-it-py==4.0.0 MarkupSafe==3.0.3 matplotlib==3.10.7 mdurl==0.1.2 modelscope==1.31.0 mpmath==1.3.0 msgpack==1.1.2 multidict==6.7.0 multiprocess==0.70.16 networkx==3.5 ninja==1.13.0 nltk==3.9.2 numba==0.62.1 numpy==1.26.4 nvidia-cublas-cu12==12.8.4.1 nvidia-cuda-cupti-cu12==12.8.90 nvidia-cuda-nvrtc-cu12==12.8.93 nvidia-cuda-runtime-cu12==12.8.90 nvidia-cudnn-cu12==9.10.2.21 nvidia-cufft-cu12==11.3.3.83 nvidia-cufile-cu12==1.13.1.3 nvidia-curand-cu12==10.3.9.90 nvidia-cusolver-cu12==11.7.3.90 nvidia-cusparse-cu12==12.5.8.93 nvidia-cusparselt-cu12==0.7.1 nvidia-ml-py==13.580.82 nvidia-nccl-cu12==2.27.5 nvidia-nvjitlink-cu12==12.8.93 nvidia-nvshmem-cu12==3.3.20 nvidia-nvtx-cu12==12.8.90 omegaconf==2.3.0 orjson==3.11.3 packaging==25.0 pandas==2.3.3 peft==0.17.1 pillow==11.3.0 platformdirs==4.5.0 pooch==1.8.2 propcache==0.4.1 protobuf==6.33.0 psutil==7.1.1 py-cpuinfo==9.0.0 pyarrow==21.0.0 pycparser==2.23 pydantic==2.10.6 pydantic_core==2.27.2 pydub==0.25.1 Pygments==2.19.2 pyparsing==3.2.5 python-dateutil==2.9.0.post0 python-multipart==0.0.20 pytz==2025.2 PyYAML==6.0.3 regex==2025.10.22 requests==2.32.5 rich==14.2.0 rouge-chinese==1.0.3 ruff==0.14.1 safehttpx==0.1.6 safetensors==0.5.3 scikit-learn==1.7.2 scipy==1.16.2 semantic-version==2.10.0 sentencepiece==0.2.1 shellingham==1.5.4 shtab==1.7.2 six==1.17.0 sniffio==1.3.1 soundfile==0.13.1 soxr==1.0.0 sse-starlette==3.0.2 starlette==0.48.0 sympy==1.14.0 tensorboard==2.20.0 tensorboard-data-server==0.7.2 termcolor==3.1.0 threadpoolctl==3.6.0 tiktoken==0.12.0 tokenizers==0.22.1 tomlkit==0.13.3 torch==2.9.0 torchvision==0.24.0 tqdm==4.67.1 transformers==4.57.1 triton==3.5.0 trl==0.9.6 typer==0.20.0 typing_extensions==4.15.0 tyro==0.8.14 tzdata==2025.2 urllib3==2.5.0 uvicorn==0.38.0 websockets==15.0.1 Werkzeug==3.1.3 xxhash==3.6.0 yarl==1.22.0 运行脚本 model_name_or_path: /hdd/wangty/model/Qwen2.5-VL-3B-Instruct trust_remote_code: true image_max_pixels: 262144 video_max_pixels: 16384

stage: sft deepspeed: /hdd/wangty/new_task/LLaMA-Factory/examples/deepspeed/ds_z2_config.json do_train: true finetuning_type: lora lora_rank: 128 lora_alpha: 256 lora_dropout: 0.05 lora_target: all freeze_multi_modal_projector: false freeze_vision_tower: false

dataset_dir: /hdd/wangty/new_task/LLaMA-Factory/task/dataset/qwen/zyzg dataset: zyzg_sag+tra_crop_train template: qwen2_vl cutoff_len: 2048 max_samples: 50000 overwrite_cache: true preprocessing_num_workers: 16 dataloader_num_workers: 8 dataloader_pin_memory: true dataloader_persistent_workers: false

output_dir: /hdd/wangty/new_task/LLaMA-Factory/task/work_dirs/qwen2.5vl_3b/zyzg/test_gpu logging_steps: 10 save_strategy: "best" load_best_model_at_end: true metric_for_best_model: 'eval_zyzg_sag+tra_crop_val_accuracy' save_total_limit: 1 overwrite_output_dir: true report_to: tensorboard

per_device_train_batch_size: 4 gradient_accumulation_steps: 1 learning_rate: 5.0e-5 num_train_epochs: 10.0 lr_scheduler_type: cosine warmup_ratio: 0.05 weight_decay: 0.0 bf16: true gradient_checkpointing: true ddp_timeout: 180000000

eval_dataset: zyzg_sag+tra_crop_val per_device_eval_batch_size: 4 eval_strategy: epoch do_sample: false compute_accuracy: true

qwen3版本只是替换了模型和模板,其他设置没有区别 训练时间: 训练的图像数据是分辨率很低的图片 qwen3-vl-4B:30h gpu利用率在30左右 qwen2.5-vl-3B:1h16min gpu利用率正常

maver1ckzz avatar Oct 22 '25 03:10 maver1ckzz

一样的环境和启动方法,为什么我的会出现这个错误,你们当时出现过吗,是怎么解决的?方便帮忙看一下吗 ValueError: Processor was not found, please check and update your model file.

transformers 版本落后了

mengzmd avatar Oct 22 '25 03:10 mengzmd

requests 2.32.5 rich 14.2.0 ruff 0.14.1 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.2 scipy 1.16.2 semantic-version 2.10.0 sentencepiece 0.2.1 setuptools 80.9.0 shellingham 1.5.4 shtab 1.7.2 six 1.17.0 sniffio 1.3.1 soundfile 0.13.1 soxr 1.0.0 sse-starlette 3.0.2 starlette 0.48.0 sympy 1.14.0 termcolor 3.1.0 threadpoolctl 3.6.0 tiktoken 0.12.0 tokenizers 0.22.1 tomlkit 0.13.3 torch 2.9.0 tqdm 4.67.1 transformers 4.57.1 triton 3.5.0 trl 0.9.6 typer 0.19.2 typing_extensions 4.15.0 tyro 0.8.14 tzdata 2025.2 urllib3 2.5.0 uvicorn 0.38.0 websockets 15.0.1 wheel 0.45.1 xxhash 3.6.0 yarl 1.22.0 但是我的transformers已经更新到了4.57.1,并且llamafactory也是最新拉取的

在 2025-10-22 11:18:23,"mengzmd" @.***> 写道:

mengzmd left a comment (hiyouga/LLaMA-Factory#9282)

一样的环境和启动方法,为什么我的会出现这个错误,你们当时出现过吗,是怎么解决的?方便帮忙看一下吗 ValueError: Processor was not found, please check and update your model file.

transformers 版本落后了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

kanqgg avatar Oct 22 '25 03:10 kanqgg

requests 2.32.5 rich 14.2.0 ruff 0.14.1 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.2 scipy 1.16.2 semantic-version 2.10.0 sentencepiece 0.2.1 setuptools 80.9.0 shellingham 1.5.4 shtab 1.7.2 six 1.17.0 sniffio 1.3.1 soundfile 0.13.1 soxr 1.0.0 sse-starlette 3.0.2 starlette 0.48.0 sympy 1.14.0 termcolor 3.1.0 threadpoolctl 3.6.0 tiktoken 0.12.0 tokenizers 0.22.1 tomlkit 0.13.3 torch 2.9.0 tqdm 4.67.1 transformers 4.57.1 triton 3.5.0 trl 0.9.6 typer 0.19.2 typing_extensions 4.15.0 tyro 0.8.14 tzdata 2025.2 urllib3 2.5.0 uvicorn 0.38.0 websockets 15.0.1 wheel 0.45.1 xxhash 3.6.0 yarl 1.22.0 但是我的transformers已经更新到了4.57.1,并且llamafactory也是最新拉取的

在 2025-10-22 11:18:23,"mengzmd" @.***> 写道:

mengzmd left a comment (hiyouga/LLaMA-Factory#9282)

一样的环境和启动方法,为什么我的会出现这个错误,你们当时出现过吗,是怎么解决的?方便帮忙看一下吗 ValueError: Processor was not found, please check and update your model file.

transformers 版本落后了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

same issue, have you sloved?

sleepyshep avatar Oct 22 '25 05:10 sleepyshep

Image 我也有同样的问题。起初,我以为这是我的H20 GPU在进行BF16计算时出现的一个异常,但在修复了那个错误之后,问题仍然存在。运行Qwen2.5-VL-7b大约需要2小时,而在相同条件下,它却需要30小时,增加了15倍,GPU内存利用率非常低,看起来没有进行连续计算,可能在进行许多CPU操作?

model

model_name_or_path: /root/liguochun/models/Qwen/Qwen3-VL-8B-Instruct image_max_pixels: 589824 video_max_pixels: 16384 trust_remote_code: true

method

stage: sft do_train: true finetuning_type: lora lora_rank: 8 lora_target: all

stage: sft

do_train: true

finetuning_type: full

freeze_vision_tower: true

freeze_multi_modal_projector: true

freeze_language_model: false

deepspeed: examples/deepspeed/ds_z3_config.json

dataset

dataset: click_0904_0916_train template: qwen3_vl cutoff_len: 4096 max_samples: 4000 overwrite_cache: true preprocessing_num_workers: 16 dataloader_num_workers: 4

output

output_dir: saves/Qwen3-VL-8B-Instruct/LR1.0e-4_Rank8_epoch20_batch64 logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true save_only_model: false report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 20.0 lr_scheduler_type: cosine warmup_ratio: 0.1

bf16: true

pure_bf16: true ddp_timeout: 180000000 resume_from_checkpoint: null

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

use_swanlab: true swanlab_run_name: Qwen3-VL-8B-Instruct/LR1.0e-4_Rank8_epoch20_batch64 # 可选 swanlab_api_key: xxxx

Package Version Editable project location


accelerate 1.11.0 aiofiles 24.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.13.1 aiosignal 1.4.0 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.11.0 async-timeout 5.0.1 attrs 25.4.0 audioread 3.0.1 av 16.0.1 boto3 1.40.55 botocore 1.40.55 Brotli 1.1.0 certifi 2025.10.5 cffi 2.0.0 charset-normalizer 3.4.4 click 8.3.0 contourpy 1.3.2 cycler 0.12.1 datasets 4.0.0 decorator 5.2.1 dill 0.3.8 docstring_parser 0.17.0 einops 0.8.1 exceptiongroup 1.3.0 fastapi 0.119.1 ffmpy 0.6.3 filelock 3.20.0 fire 0.7.1 fonttools 4.60.1 frozenlist 1.8.0 fsspec 2025.3.0 gradio 5.45.0 gradio_client 1.13.0 groovy 0.1.2 h11 0.16.0 hf_transfer 0.1.9 hf-xet 1.1.10 httpcore 1.0.9 httpx 0.28.1 huggingface-hub 0.35.3 idna 3.11 jieba 0.42.1 Jinja2 3.1.6 jmespath 1.0.1 joblib 1.5.2 kiwisolver 1.4.9 lazy_loader 0.4 librosa 0.11.0 llamafactory 0.9.4.dev0 /root/liguochun/LLaMA-Factory-main llvmlite 0.45.1 markdown-it-py 4.0.0 MarkupSafe 3.0.3 matplotlib 3.10.7 mdurl 0.1.2 modelscope 1.31.0 mpmath 1.3.0 msgpack 1.1.2 multidict 6.7.0 multiprocess 0.70.16 networkx 3.4.2 nltk 3.9.2 numba 0.62.1 numpy 1.26.4 nvidia-cublas-cu12 12.8.4.1 nvidia-cuda-cupti-cu12 12.8.90 nvidia-cuda-nvrtc-cu12 12.8.93 nvidia-cuda-runtime-cu12 12.8.90 nvidia-cudnn-cu12 9.10.2.21 nvidia-cufft-cu12 11.3.3.83 nvidia-cufile-cu12 1.13.1.3 nvidia-curand-cu12 10.3.9.90 nvidia-cusolver-cu12 11.7.3.90 nvidia-cusparse-cu12 12.5.8.93 nvidia-cusparselt-cu12 0.7.1 nvidia-ml-py 13.580.82 nvidia-nccl-cu12 2.27.5 nvidia-nvjitlink-cu12 12.8.93 nvidia-nvshmem-cu12 3.3.20 nvidia-nvtx-cu12 12.8.90 nvitop 1.5.3 omegaconf 2.3.0 orjson 3.11.3 packaging 25.0 pandas 2.3.3 peft 0.17.1 pillow 11.3.0 pip 25.2 platformdirs 4.5.0 pooch 1.8.2 prettytable 3.16.0 propcache 0.4.1 protobuf 6.33.0 psutil 7.1.1 pyarrow 21.0.0 pycparser 2.23 pydantic 2.10.6 pydantic_core 2.27.2 pydub 0.25.1 pyecharts 2.0.9 Pygments 2.19.2 pyparsing 3.2.5 python-dateutil 2.9.0.post0 python-multipart 0.0.20 pytz 2025.2 PyYAML 6.0.3 regex 2025.10.22 requests 2.32.5 rich 13.9.4 rouge-chinese 1.0.3 ruff 0.14.1 s3transfer 0.14.0 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.2 scipy 1.15.3 semantic-version 2.10.0 sentencepiece 0.2.1 setuptools 80.9.0 shellingham 1.5.4 shtab 1.7.2 simplejson 3.20.2 six 1.17.0 sniffio 1.3.1 soundfile 0.13.1 soxr 1.0.0 sse-starlette 3.0.2 starlette 0.48.0 swanlab 0.6.12 sympy 1.14.0 termcolor 3.1.0 threadpoolctl 3.6.0 tiktoken 0.12.0 tokenizers 0.22.1 tomlkit 0.13.3 torch 2.9.0 torchvision 0.24.0 tqdm 4.67.1 transformers 4.57.1 triton 3.5.0 trl 0.9.6 typer 0.20.0 typing_extensions 4.15.0 tyro 0.8.14 tzdata 2025.2 urllib3 2.5.0 uvicorn 0.38.0 wcwidth 0.2.14 websockets 15.0.1 wheel 0.45.1 wrapt 2.0.0 xxhash 3.6.0 yarl 1.22.0

liguochun0304 avatar Oct 22 '25 06:10 liguochun0304

同样的问题,2张a100 40G, 跑DPO带lora,1000个sample Qwen2.5VL-7b 大概十分钟, qwen3VL-8b需要四小时

sean-xr avatar Oct 22 '25 18:10 sean-xr

System Info gpu:2×A800 cuda:11.8 python3.10 Package Version

absl-py 1.3.0 annoy 1.17.3 apex 0.1 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 args 0.1.0 asttokens 2.2.1 astunparse 1.6.3 atari-py 0.2.9 attrs 22.1.0 audioread 3.0.0 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 5.0.1 blis 0.7.9 box2d-py 2.3.8 cachetools 5.2.0 catalogue 2.0.8 certifi 2022.12.7 cffi 1.15.1 chardet 3.0.4 charset-normalizer 2.1.1 click 8.1.3 clint 0.5.1 cloudpickle 2.2.0 cmake 3.24.1.1 comm 0.2.1 confection 0.0.3 contourpy 1.0.6 cuda-python 11.7.0+0.g95a2041.dirty cudf 22.10.0a0+316.gad1ba132d2.dirty cugraph 22.10.0a0+113.g6bbdadf8.dirty cuml 22.10.0a0+56.g3a8dea659.dirty cupy-cuda111 12.3.0 cupy-cuda118 11.0.0 cycler 0.11.0 cymem 2.0.7 Cython 0.29.32 cytoolz 0.12.2 dask 2022.9.2 dask-cuda 22.10.0a0+23.g62a1ee8 dask-cudf 22.10.0a0+316.gad1ba132d2.dirty dbus-python 1.2.16 debugpy 1.6.4 decorator 5.1.1 defusedxml 0.7.1 distlib 0.3.8 distributed 2022.9.2 distro 1.4.0 dlib 19.24.2 entrypoints 0.4 exceptiongroup 1.0.4 execnet 1.9.0 executing 1.2.0 expecttest 0.1.3 fastjsonschema 2.16.2 fastrlock 0.8.1 filelock 3.13.1 fonttools 4.38.0 fsspec 2022.11.0 funcsigs 1.0.2 google-auth 2.15.0 google-auth-oauthlib 0.4.6 gpg 1.13.1 graphsurgeon 0.4.6 grpcio 1.51.1 gym 0.26.2 gym-notices 0.0.8 gym-retro 0.8.0 HeapDict 1.0.1 hypothesis 5.35.1 idna 3.4 imagecodecs 2023.3.16 importlib-metadata 5.1.0 importlib-resources 5.10.1 incremental 22.10.0 iniconfig 1.1.1 intel-openmp 2021.4.0 ipykernel 6.19.2 ipython 8.7.0 ipython-genutils 0.2.0 ipywidgets 8.1.1 jedi 0.18.2 jellyfish 1.0.3 Jinja2 3.1.2 joblib 1.2.0 json5 0.9.10 jsonschema 4.17.3 jupyter_client 7.4.8 jupyter_core 5.1.0 jupyter-tensorboard 0.2.0 jupyterlab 2.3.2 jupyterlab-pygments 0.2.2 jupyterlab-server 1.2.0 jupyterlab-widgets 3.0.9 jupytext 1.14.4 kaggle 1.5.16 kiwisolver 1.4.4 langcodes 3.3.0 librosa 0.9.2 llvmlite 0.39.1 locket 1.0.0 Markdown 3.4.1 markdown-it-py 2.1.0 MarkupSafe 2.1.1 marshmallow 3.20.1 matplotlib 3.6.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.3 mdurl 0.1.2 menpo 0.11.0 mistune 2.0.4 mkl 2021.1.1 mkl-devel 2021.1.1 mkl-include 2021.1.1 mock 4.0.3 mpmath 1.2.1 msgpack 1.0.4 murmurhash 1.0.9 nbclient 0.7.2 nbconvert 7.2.6 nbformat 5.7.0 nest-asyncio 1.5.6 networkx 2.6.3 notebook 6.4.10 numba 0.56.4 numpy 1.22.2 nvidia-dali-cuda110 1.20.0 nvidia-pyindex 1.0.9 nvtx 0.2.5 oauthlib 3.2.2 onnx 1.12.0 opencv 4.6.0 packaging 22.0 pandas 1.5.2 pandocfilters 1.5.0 parso 0.8.3 partd 1.3.0 path 16.9.0 path.py 12.5.0 pathlib2 2.3.7.post1 pathy 0.10.1 pbr 6.0.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.2.0 pip 21.2.4 pkgutil_resolve_name 1.3.10 platformdirs 4.1.0 plotly 5.18.0 pluggy 1.0.0 polygraphy 0.43.1 pooch 1.6.0 preshed 3.0.8 prettytable 3.5.0 prometheus-client 0.15.0 prompt-toolkit 3.0.36 protobuf 3.20.1 psutil 5.9.4 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 9.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.10.1 pycocotools 2.0+nv0.7.1 pycparser 2.21 pycrypto 2.6.1 pydantic 1.10.2 pyemd 1.0.0 pyglet 1.5.28 Pygments 2.13.0 PyGObject 3.36.0 pylibcugraph 22.10.0a0+113.g6bbdadf8.dirty pylibraft 22.10.0a0+81.g08abc72.dirty pynvml 11.4.1 pynvrtc 9.2 PyOpenGL 3.1.7 PyOpenGL-accelerate 3.1.7 pyparsing 3.0.9 pyphen 0.14.0 pyrsistent 0.19.2 pytest 7.2.0 pytest-rerunfailures 10.3 pytest-shard 0.1.2 pytest-xdist 3.1.0 python-dateutil 2.8.2 python-hostlist 1.22 python-slugify 8.0.1 pytorch-quantization 2.1.2 pytz 2022.6 PyYAML 6.0 pyzmq 24.0.1 qtconsole 5.5.1 QtPy 2.4.1 raft-dask 22.10.0a0+81.g08abc72.dirty raven 6.10.0 regex 2022.10.31 requests 2.28.1 requests-oauthlib 1.3.1 requests-toolbelt 1.0.0 resampy 0.4.2 retrowrapper 0.3.0 rmm 22.10.0a0+38.ge043158.dirty rsa 4.9 scikit-learn 0.24.2 scipy 1.6.3 seaborn 0.13.1 Send2Trash 1.8.0 setuptools 59.5.0 six 1.16.0 smart-open 6.3.0 sortedcontainers 2.4.0 soundfile 0.11.0 soupsieve 2.3.2.post1 spacy 3.4.4 spacy-legacy 3.0.10 spacy-loggers 1.0.4 sphinx-glpi-theme 0.3 srsly 2.4.5 ssh-import-id 5.10 stack-data 0.6.2 sympy 1.11.1 tbb 2021.7.1 tblib 1.7.0 tenacity 8.2.3 tensorboard 2.9.0 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorrt 8.5.1.7 terminado 0.17.1 text-unidecode 1.3 thinc 8.1.5 threadpoolctl 3.1.0 tifffile 2023.7.10 tinycss2 1.2.1 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 1.14.0a0+410ce96 torch-tensorrt 1.3.0a0 torchtext 0.13.0a0+fae8e8c torchvision 0.15.0a0 tornado 6.4 tqdm 4.64.1 traitlets 5.7.1 transformer-engine 0.3.0 treelite 2.4.0 treelite-runtime 2.4.0 typer 0.7.0 typing_extensions 4.4.0 ucx-py 0.27.0a0+29.ge9e81f8 uff 0.6.9 urllib3 1.26.13 virtualenv 20.25.0 visdom 0.2.4 wasabi 0.10.1 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.7.0 Werkzeug 2.2.2 wheel 0.38.4 widgetsnbextension 4.0.9 xdoctest 1.0.2 xgboost 1.6.2 zict 2.2.0 zipp 3.11.0 zmq 0.0.0 训练参数

model

model_name_or_path: /dfs/data/qwen3_vl_8b image_max_pixels: 1048576

method

stage: sft do_train: true finetuning_type: full freeze_vision_tower: false freeze_multi_modal_projector: false freeze_language_model: false deepspeed: examples/deepspeed/ds_z3_config.json # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

dataset

dataset: lingbujian_fuza_prompt template: qwen3_vl cutoff_len: 2048 max_samples: 30000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: saves/qwen3vl/full/sft save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-5 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 2000 eval_dataset: lingbujian_val compute_accuracy: true 训练参数和qwen2.5_vl一致, 数据集任务为对图像分类,答案为类别,训练数据2.4W,验证数据0.7W, 同样数据和训练参数在使用qwen2.5_vl全参微调时大约为160h

Image在训练到9h后停止了训练进度不再更新,并且gpu的其中一个占用率0%

Image

mengzmd avatar Oct 23 '25 01:10 mengzmd

requests 2.32.5 rich 14.2.0 ruff 0.14.1 safehttpx 0.1.6 safetensors 0.5.3 scikit-learn 1.7.2 scipy 1.16.2 semantic-version 2.10.0 sentencepiece 0.2.1 setuptools 80.9.0 shellingham 1.5.4 shtab 1.7.2 six 1.17.0 sniffio 1.3.1 soundfile 0.13.1 soxr 1.0.0 sse-starlette 3.0.2 starlette 0.48.0 sympy 1.14.0 termcolor 3.1.0 threadpoolctl 3.6.0 tiktoken 0.12.0 tokenizers 0.22.1 tomlkit 0.13.3 torch 2.9.0 tqdm 4.67.1 transformers 4.57.1 triton 3.5.0 trl 0.9.6 typer 0.19.2 typing_extensions 4.15.0 tyro 0.8.14 tzdata 2025.2 urllib3 2.5.0 uvicorn 0.38.0 websockets 15.0.1 wheel 0.45.1 xxhash 3.6.0 yarl 1.22.0 但是我的transformers已经更新到了4.57.1,并且llamafactory也是最新拉取的

在 2025-10-22 11:18:23,"mengzmd" @.***> 写道:

mengzmd left a comment (hiyouga/LLaMA-Factory#9282)

一样的环境和启动方法,为什么我的会出现这个错误,你们当时出现过吗,是怎么解决的?方便帮忙看一下吗 ValueError: Processor was not found, please check and update your model file.

transformers 版本落后了

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.*** 你少torchvision,安装试试

mengzmd avatar Oct 23 '25 03:10 mengzmd

安装llama-factory的requirements也出现了同样训练很慢的问题, 重新安装了QwenVL官方的环境, 训练速度快了很多, GPU利用率90%以上了

weiyuann1 avatar Oct 23 '25 06:10 weiyuann1

安装llama-factory的requirements也出现了同样训练很慢的问题, 重新安装了QwenVL官方的环境, 训练速度快了很多, GPU利用率90%以上了

是qwen官方的finetune代码吗

Kuangdd01 avatar Oct 23 '25 06:10 Kuangdd01

安装llama-factory的requirements也出现了同样训练很慢的问题, 重新安装了QwenVL官方的环境, 训练速度快了很多, GPU利用率90%以上了

老哥方便说一下具体是怎么做的吗?非常感谢!

sean-xr avatar Oct 23 '25 12:10 sean-xr

The same slow training problem occurred when installing llama-factory's requirements. After reinstalling the QwenVL official environment, the training speed was much faster and the GPU utilization rate was over 90%.

Could you please tell me how to do it? Thank you very much!

@sean-xr He referred to this (https://github.com/QwenLM/Qwen3-VL/tree/main/qwen-vl-finetune) I suppose, it helped me too.

kimvutht avatar Oct 27 '25 09:10 kimvutht

The same slow training problem occurred when installing llama-factory's requirements. After reinstalling the QwenVL official environment, the training speed was much faster and the GPU utilization rate was over 90%.

Could you please tell me how to do it? Thank you very much!

@sean-xr He referred to this (https://github.com/QwenLM/Qwen3-VL/tree/main/qwen-vl-finetune) I suppose, it helped me too.

Thanks!

sean-xr avatar Oct 27 '25 10:10 sean-xr

The same slow training problem occurred when installing llama-factory's requirements. After reinstalling the QwenVL official environment, the training speed was much faster and the GPU utilization rate was over 90%.

Could you please tell me how to do it? Thank you very much!

@sean-xr He referred to this (https://github.com/QwenLM/Qwen3-VL/tree/main/qwen-vl-finetune) I suppose, it helped me too.

大佬,您是使用那个框架进行的训练吗而不是使用LLama factory了吗?

kanqgg avatar Oct 27 '25 10:10 kanqgg

The same slow training problem occurred when installing llama-factory's requirements. After reinstalling the QwenVL official environment, the training speed was much faster and the GPU utilization rate was over 90%.

Could you please tell me how to do it? Thank you very much!

@sean-xr He referred to this (https://github.com/QwenLM/Qwen3-VL/tree/main/qwen-vl-finetune) I suppose, it helped me too.

大佬,您是使用那个框架进行的训练吗而不是使用LLama factory了吗?

No you could still use llama-facctory.

What I did is very simple: in a new conda environment, first install the depencies suggested by Qwen3-VL:

torch==2.6.0
torchvision==0.21.0
transformers==4.57.0.dev0
deepspeed==0.17.1
flash_attn==2.7.4.post1
triton==3.2.0
accelerate==1.7.0
torchcodec==0.2
peft==0.17.1

(Note: I did not install flash_attn and deepspeed, and used transformers==4.57.1)

Then you could proceed and install Llama-factory's dependencies by:

cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation

This works for me and solves my training speed issue for Qwen3VL.

sean-xr avatar Oct 27 '25 15:10 sean-xr

The same slow training problem occurred when installing llama-factory's requirements. After reinstalling the QwenVL official environment, the training speed was much faster and the GPU utilization rate was over 90%.

Could you please tell me how to do it? Thank you very much!

@sean-xr He referred to this (https://github.com/QwenLM/Qwen3-VL/tree/main/qwen-vl-finetune) I suppose, it helped me too.

大佬,您是使用那个框架进行的训练吗而不是使用LLama factory了吗?

No you could still use llama-facctory.

What I did is very simple: in a new conda environment, first install the depencies suggested by Qwen3-VL:

torch==2.6.0
torchvision==0.21.0
transformers==4.57.0.dev0
deepspeed==0.17.1
flash_attn==2.7.4.post1
triton==3.2.0
accelerate==1.7.0
torchcodec==0.2
peft==0.17.1

(Note: I did not install flash_attn and deepspeed, and used transformers==4.57.1)

Then you could proceed and install Llama-factory's dependencies by:

cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation

This works for me and solves my training speed issue for Qwen3VL.

good!This also works for me about training speed

nuo1nuo avatar Oct 29 '25 15:10 nuo1nuo

ok ok !thank you !

---- 回复的原邮件 ---- | 发件人 | @.> | | 发送日期 | 2025年10月30日 00:00 | | 收件人 | hiyouga/LLaMA-Factory @.> | | 抄送人 | kanqgg @.>, Comment @.> | | 主题 | Re: [hiyouga/LLaMA-Factory] Qwen3-VL-8B和4B训练时候的GPU使用率低下 (Issue #9282) | nuo1nuo left a comment (hiyouga/LLaMA-Factory#9282)

The same slow training problem occurred when installing llama-factory's requirements. After reinstalling the QwenVL official environment, the training speed was much faster and the GPU utilization rate was over 90%.

Could you please tell me how to do it? Thank you very much!

@sean-xr He referred to this (https://github.com/QwenLM/Qwen3-VL/tree/main/qwen-vl-finetune) I suppose, it helped me too.

大佬,您是使用那个框架进行的训练吗而不是使用LLama factory了吗?

No you could still use llama-facctory.

What I did is very simple: in a new conda environment, first install the depencies suggested by Qwen3-VL:

torch==2.6.0 torchvision==0.21.0 transformers==4.57.0.dev0 deepspeed==0.17.1 flash_attn==2.7.4.post1 triton==3.2.0 accelerate==1.7.0 torchcodec==0.2 peft==0.17.1

(Note: I did not install flash_attn and deepspeed, and used transformers==4.57.1)

Then you could proceed and install Llama-factory's dependencies by:

cd LLaMA-Factory pip install -e ".[torch,metrics]" --no-build-isolation

This works for me and solves my training speed issue for Qwen3VL.

good!This also works for me about training speed

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

kanqgg avatar Oct 30 '25 01:10 kanqgg

使用qwen3-vl官方的qwen-vl-finetune依赖貌似可以很好解决blackwell以下架构GPU的问题,但是对于blackwell架构的GPU,它支持的pytorch最小版本只能到2.7.1,所以我把torch从2.9.0降级到2.7.1,torchvision降级到0.22.1,然后其它依赖使用的版本和qwen-vl-finetune一致,但问题依旧。

Formula24Code avatar Oct 30 '25 02:10 Formula24Code