NPU训练qwen2.5-vl报错
参数如下:
swift sft
--model XX/qwen25-vl-3B
--model_type qwen2_5_vl
--num_train_epochs 1
--freeze_llm False
--freeze_vit False
--freeze_aligner False
--dataset $train_list
--max_pixels 1330000
--max_length 4096
--eval_steps 1000
--eval_strategy no
--save_steps 300
--save_total_limit 2
--train_type full
--per_device_train_batch_size 1
--learning_rate 1e-5
--output_dir output
--deepspeed zero2
--ddp_backend hccl
--truncation_strategy right
--torch_dtype float16
--lora_dtype float16
报错如下:
Traceback (most recent call last):
File "/home/ma-user/work/ms-swift-main/swift/cli/sft.py", line 16, in
环境如下: absl-py 2.1.0 accelerate 1.3.0 addict 2.4.0 aiofiles 23.2.1 aiohappyeyeballs 2.4.4 aiohttp 3.11.11 aiosignal 1.3.2 aliyun-python-sdk-core 2.16.0 aliyun-python-sdk-kms 2.16.5 annotated-types 0.7.0 anyio 4.8.0 arrow 1.3.0 ascendebug 0.1.0 asttokens 2.4.1 astunparse 1.6.3 async-timeout 5.0.1 attrdict 2.0.1 attrs 23.2.0 auto_tune 0.1.0 av 14.0.1 binaryornot 0.4.4 binpacking 1.5.2 certifi 2024.8.30 cffi 1.17.1 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 configparser 6.0.0 contourpy 1.3.1 cookiecutter 2.6.0 cpm-kernels 1.0.11 crcmod 1.7 cryptography 3.4.7 cycler 0.12.1 dacite 1.8.1 dataflow 0.0.1 datasets 3.2.0 debugpy 1.8.5 decorator 5.1.1 deepspeed 0.16.2 dill 0.3.8 distro 1.9.0 docstring_parser 0.16 einops 0.8.0 entrypoints 0.4 esdk-obs-python 3.23.12 exceptiongroup 1.2.2 executing 2.1.0 fastapi 0.115.6 ffmpy 0.5.0 filelock 3.16.1 flatbuffers 24.12.23 fonttools 4.54.1 frozenlist 1.5.0 fsspec 2024.9.0 future 1.0.0 gast 0.6.0 google-pasta 0.2.0 gradio 5.12.0 gradio_client 1.5.4 grpcio 1.69.0 h11 0.14.0 h5py 3.12.1 hccl 0.1.0 hccl_parser 0.1 hjson 3.1.0 httpcore 1.0.7 httpx 0.28.1 huaweicloudsdkcore 3.1.94 huggingface-hub 0.27.1 idna 3.8 importlib_metadata 8.5.0 ipykernel 6.7.0 ipython 8.27.0 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.4 jiter 0.8.2 jmespath 0.10.0 joblib 1.4.2 jupyter_client 7.4.9 jupyter_core 5.7.2 keras 3.8.0 kiwisolver 1.4.8 lazy-import 0.2.2 libclang 18.1.1 llm_datadist 0.0.1 lxml 5.3.0 ma-cli 1.2.3 Markdown 3.7 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.10.1 matplotlib-inline 0.1.7 mdurl 0.1.2 ml-dtypes 0.4.1 mock 5.1.0 modelarts 1.4.28 modelscope 1.22.2 moxing-framework 2.2.8.0aa484aa mpmath 1.3.0 ms_swift 3.2.0.dev0 /home/ma-user/work/z00854892/ms-swift-main msgpack 1.1.0 msobjdump 0.1.0 multidict 6.1.0 multiprocess 0.70.16 namex 0.0.8 nest-asyncio 1.6.0 networkx 3.2.1 ninja 1.11.1.3 nltk 3.9.1 npu_bridge 1.15.0 npu_device 0.1 numpy 1.26.0 op_compile_tool 0.1.0 op_gen 0.1 op_test_frame 0.1 opc_tool 0.1.0 openai 1.59.8 opt_einsum 3.4.0 optree 0.14.0 orjson 3.10.14 oss2 2.19.1 packaging 24.1 pandas 2.2.3 parso 0.8.4 pathlib2 2.3.7.post1 peft 0.14.0 pexpect 4.9.0 pillow 10.4.0 pip 22.3.1 platformdirs 4.3.2 prettytable 3.7.0 prompt_toolkit 3.0.47 propcache 0.2.1 protobuf 3.20.3 psutil 6.0.0 ptyprocess 0.7.0 pure_eval 0.2.3 py-cpuinfo 9.0.0 pyarrow 19.0.0 pyasn1 0.5.1 pycparser 2.22 pycryptodome 3.21.0 pydantic 2.10.5 pydantic_core 2.27.2 pydub 0.25.1 Pygments 2.18.0 pyparsing 3.2.0 python-dateutil 2.9.0.post0 python-multipart 0.0.20 python-slugify 8.0.4 pytz 2024.2 PyYAML 6.0.2 pyzmq 26.2.0 qwen-vl-utils 0.0.10 regex 2024.11.6 requests 2.32.3 requests-toolbelt 1.0.0 rich 13.9.2 rouge 1.0.1 ruff 0.9.2 safehttpx 0.1.6 safetensors 0.5.2 schedule_search 0.0.1 scipy 1.14.1 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 69.5.1 shellingham 1.5.4 shtab 1.7.1 simplejson 3.19.3 six 1.16.0 sniffio 1.3.1 sortedcontainers 2.4.0 stack-data 0.6.3 starlette 0.41.3 sympy 1.13.1 tabulate 0.9.0 te 0.4.0 tenacity 8.2.2 tensorboard 2.18.0 tensorboard-data-server 0.7.2 tensorflow 2.18.0 tensorflow-io 0.37.1 tensorflow-io-gcs-filesystem 0.37.1 termcolor 2.5.0 text-unidecode 1.3 tf_keras 2.18.0 tiktoken 0.8.0 timm 1.0.13 tokenizers 0.21.0 tomlkit 0.13.2 torch 2.1.0 torch-npu 2.1.0.post8 torchvision 0.16.0 tornado 6.4.1 tqdm 4.66.5 traitlets 5.14.3 transformers 4.49.0.dev0 transformers-stream-generator 0.0.5 trl 0.15.2 typeguard 4.4.1 typer 0.15.1 types-python-dateutil 2.9.0.20241003 typing_extensions 4.12.2 tyro 0.9.11 tzdata 2024.2 urllib3 2.2.2 uvicorn 0.34.0 wcwidth 0.2.13 websockets 14.1 Werkzeug 3.1.3 wheel 0.38.4 wrapt 1.17.2 xxhash 3.5.0 yarl 1.18.3 zipp 3.21.0 zstandard 0.23.0
把VIT冻住可以训--freeze_vit True \ 全参微调报如上错误
老哥,这个问题 你解决了么
Same problem. Is the influence of npu operator for vit training (lack of specified operator) ?
升级到最新的版本 然后设置--fp16 false --bf16 false 试试
请问这个问题现在解决了吗,刚用swift训练也遇到这个问题了
请问这个问题解决了吗,我目前也有这个问题,swift也是比较新的
把VIT冻住可以训--freeze_vit True \ 全参微调报如上错误