ms-swift 多卡量化GPU负载不均

Describe the bug 4卡量化32B模型时负载不均；8卡量化72B模型OOM

Your hardware and system info GPU 8*A800

Additional context

GPU利用率
显存使用率

Oct 14 '25 03:10 zzc0430

try --device_map cpu

Will only use cuda:0 for quantization.

Oct 14 '25 03:10 Jintao-Huang

try --device_map cpu

Will only use cuda:0 for quantization.

@Jintao-Huang VRAM OOM When using Single GPU

script

OMP_NUM_THREADS=14 \
swift export \
    --model ${MODEL} \
    --quant_method gptq \
    --dataset ${DATASET} \
    --quant_n_samples 512 \
    --quant_batch_size 1 \
    --max_length 8192 \
    --quant_method gptq \
    --quant_bits 4 \
    --device_map cpu \
    --output_dir ${OUTPUT_MODEL}

error

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.26 GiB. GPU 0 has a total capacity of 79.35 GiB of which 1.32 GiB is free.

Oct 14 '25 05:10 zzc0430

CUDA_VISIBLE_DEVICES=2,3,4,5
MAX_PIXELS=117600
swift export
--model Qwen2.5-VL-7B
--dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl'
--quant_n_samples 256
--quant_batch_size -1
--max_length 16384
--quant_method awq
--quant_bits 4
--output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ

我这也是同样的问题， 4卡h800 量化qwen2.5-vl-7b

我换成--device_map cpu 也会OOM @Jintao-Huang

Oct 23 '25 03:10 Yimi81

CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ

我这也是同样的问题， 4卡h800 量化qwen2.5-vl-7b
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)

@Yimi81 根据截图来看，是由于VRAM碎片化导致的，试试增加环境变量：PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Oct 23 '25 06:10 zzc0430

CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ

我这也是同样的问题， 4卡h800 量化qwen2.5-vl-7b
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)

请问您解决了吗？

Nov 12 '25 03:11 sunjinguo92

CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ 我这也是同样的问题， 4卡h800 量化qwen2.5-vl-7b 我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)

请问您解决了吗？

@sunjinguo92 设置PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True不会OOM了

Nov 12 '25 03:11 zzc0430

谢谢

zzc @.***> 于2025年11月12日周三 11:08写道：

zzc0430 left a comment (modelscope/ms-swift#6119) https://github.com/modelscope/ms-swift/issues/6119#issuecomment-3519696382

CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ 我这也是同样的问题， 4卡h800 量化qwen2.5-vl-7b [image: Image] https://private-user-images.githubusercontent.com/66633207/504509972-8a0d0765-7e9e-4d26-87aa-769b5388614e.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjI5MTY4OTUsIm5iZiI6MTc2MjkxNjU5NSwicGF0aCI6Ii82NjYzMzIwNy81MDQ1MDk5NzItOGEwZDA3NjUtN2U5ZS00ZDI2LTg3YWEtNzY5YjUzODg2MTRlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTExMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUxMTEyVDAzMDMxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU0NGNlZDhkZWY0YTgzZjI0YTFjYzg2YmFjNWRhZWY4NTBiZGRlZDkwNWNiMmE1OTZkM2JiN2U2NzhlZjgyMTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.AO8D600MSosPxYRoT7asKtd9PpyMr6lBrt9vp7EN_jo 我换成--device_map cpu 也会OOM @.*** https://github.com/Jintao-Huang ](https://github.com/Jintao-Huang)

请问您解决了吗？

@sunjinguo92 https://github.com/sunjinguo92 设置 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True不会OOM了

— Reply to this email directly, view it on GitHub https://github.com/modelscope/ms-swift/issues/6119#issuecomment-3519696382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGV66GMVR6I2KDFUDMALZXL34KQBZAVCNFSM6AAAAACJDD3B3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMJZGY4TMMZYGI . You are receiving this because you were mentioned.Message ID: @.***>

Nov 13 '25 14:11 sunjinguo92