deepspeed报错
CUDA_VISIBLE_DEVICES=0,1,2
MAX_PIXELS=1003520
swift sft
--model /home/jdn/.cache/modelscope/hub/models/deepseek-ai/deepseek-vl2-tiny
--dataset /home/jdn/deepseek/save_json/xunlian_CT_and_Xray.json
--train_type lora
--torch_dtype float16
--num_train_epochs 5
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-4
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--freeze_vit true
--gradient_accumulation_steps 16
--lazy_tokenize true
--eval_steps 50
--save_steps 50
--save_total_limit 5
--logging_steps 5
--max_length 2048
--output_dir /home/jdn/deepseek/output
--warmup_ratio 0.05
--lazy_tokenize true
--dataloader_num_workers 2
--deepspeed zero3
报错Traceback (most recent call last):
File "/home/jdn/ms-swift/swift/cli/sft.py", line 11, in device_map. '
ValueError: DeepSpeed is not compatible with device_map. n_gpu: 3, local_world_size: 1.如何修改
@Jintao-Huang
NPROC_PER_NODE=3 \
I encountered the same issue, set NPROC_PER_NODE to the same number as n_gpus work for me.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
swift sft \
--model /models/Qwen3-32B \
......................
Traceback (most recent call last):
File "/usr/local/python3.10.15/lib/python3.10/site-packages/swift/cli/sft.py", line 7, in device_map. '
ValueError: DeepSpeed is not compatible with device_map. n_gpu: 2, local_world_size: 1.
[ERROR] 2025-06-25-06:18:38 (PID:257, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
ASCEND_RT_VISIBLE_DEVICES=0,1 NPROC_PER_NODE=2
swift sft
...
--train_type full
--torch_dtype bfloat16
--num_train_epochs 2
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--device_map auto
--learning_rate 1e-4
--target_modules all-linear
--freeze_vit true
--gradient_accumulation_steps $(expr 16 / $NPROC_PER_NODE)
--eval_steps 50
--save_steps 50
--save_total_limit 2
--logging_steps 5
--max_length 2048
--warmup_ratio 0.05
--dataloader_num_workers 0
--model_kwargs '{"device_map": null}'
--deepspeed zero0
仍然会报错 ValueError: DeepSpeed is not compatible with device_map. n_gpu: 2, local_world_size: 1.
原因在于,这一块进程管理没有做好:NPROC_PER_NODE 和 --nproc_per_node 的差别,多卡运行的时候,请在运行脚本设置NPROC_PER_NODE,不要在swift sft 运行入参传 --device_map, --nproc_per_node ,因为这个会被transformer框架进程干预。
The reason is that the process management in this area is not done well: the difference between NPROC_PER_NODE and --nproc_per_node. When running with multiple cards, please set NPROC_PER_NODE in the running script, and do not pass --device_map and --nproc_per_node as parameters in Swift SFT, because this will be interfered with by the Transformer framework process.
原因在于,这一块进程管理没有做好:NPROC_PER_NODE 和 --nproc_per_node 的差别,多卡运行的时候,请在运行脚本设置NPROC_PER_NODE,不要在swift sft 运行入参传 --device_map, --nproc_per_node ,因为这个会被transformer框架进程干预。
The reason is that the process management in this area is not done well: the difference between NPROC_PER_NODE and --nproc_per_node. When running with multiple cards, please set NPROC_PER_NODE in the running script, and do not pass --device_map and --nproc_per_node as parameters in Swift SFT, because this will be interfered with by the Transformer framework process.
没有传入也报错了