多卡量化GPU负载不均
Describe the bug 4卡量化32B模型时负载不均;8卡量化72B模型OOM
Your hardware and system info GPU 8*A800
Additional context
-
GPU利用率
-
显存使用率
try --device_map cpu
Will only use cuda:0 for quantization.
try
--device_map cpuWill only use cuda:0 for quantization.
@Jintao-Huang VRAM OOM When using Single GPU
script
OMP_NUM_THREADS=14 \
swift export \
--model ${MODEL} \
--quant_method gptq \
--dataset ${DATASET} \
--quant_n_samples 512 \
--quant_batch_size 1 \
--max_length 8192 \
--quant_method gptq \
--quant_bits 4 \
--device_map cpu \
--output_dir ${OUTPUT_MODEL}
error
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.26 GiB. GPU 0 has a total capacity of 79.35 GiB of which 1.32 GiB is free.
CUDA_VISIBLE_DEVICES=2,3,4,5
MAX_PIXELS=117600
swift export
--model Qwen2.5-VL-7B
--dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl'
--quant_n_samples 256
--quant_batch_size -1
--max_length 16384
--quant_method awq
--quant_bits 4
--output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ
我这也是同样的问题, 4卡h800 量化qwen2.5-vl-7b
我换成--device_map cpu 也会OOM @Jintao-Huang
CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ
我这也是同样的问题, 4卡h800 量化qwen2.5-vl-7b
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)
@Yimi81 根据截图来看,是由于VRAM碎片化导致的,试试增加环境变量:PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ
我这也是同样的问题, 4卡h800 量化qwen2.5-vl-7b
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)
请问您解决了吗?
CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ 我这也是同样的问题, 4卡h800 量化qwen2.5-vl-7b
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)
请问您解决了吗?
@sunjinguo92 设置PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True不会OOM了
谢谢
zzc @.***> 于2025年11月12日周三 11:08写道:
zzc0430 left a comment (modelscope/ms-swift#6119) https://github.com/modelscope/ms-swift/issues/6119#issuecomment-3519696382
CUDA_VISIBLE_DEVICES=2,3,4,5 MAX_PIXELS=117600 swift export --model Qwen2.5-VL-7B --dataset 'listwise_sft_0923-1_2.2w.sampled1000.jsonl' --quant_n_samples 256 --quant_batch_size -1 --max_length 16384 --quant_method awq --quant_bits 4 --output_dir /media/Qwen2.5-VL-7B-1009-4-AWQ 我这也是同样的问题, 4卡h800 量化qwen2.5-vl-7b [image: Image] https://private-user-images.githubusercontent.com/66633207/504509972-8a0d0765-7e9e-4d26-87aa-769b5388614e.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjI5MTY4OTUsIm5iZiI6MTc2MjkxNjU5NSwicGF0aCI6Ii82NjYzMzIwNy81MDQ1MDk5NzItOGEwZDA3NjUtN2U5ZS00ZDI2LTg3YWEtNzY5YjUzODg2MTRlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTExMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUxMTEyVDAzMDMxNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU0NGNlZDhkZWY0YTgzZjI0YTFjYzg2YmFjNWRhZWY4NTBiZGRlZDkwNWNiMmE1OTZkM2JiN2U2NzhlZjgyMTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.AO8D600MSosPxYRoT7asKtd9PpyMr6lBrt9vp7EN_jo 我换成--device_map cpu 也会OOM @.*** https://github.com/Jintao-Huang ](https://github.com/Jintao-Huang)
请问您解决了吗?
@sunjinguo92 https://github.com/sunjinguo92 设置 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True不会OOM了
— Reply to this email directly, view it on GitHub https://github.com/modelscope/ms-swift/issues/6119#issuecomment-3519696382, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGV66GMVR6I2KDFUDMALZXL34KQBZAVCNFSM6AAAAACJDD3B3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMJZGY4TMMZYGI . You are receiving this because you were mentioned.Message ID: @.***>
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)
我换成--device_map cpu 也会OOM [@Jintao-Huang](https://github.com/Jintao-Huang)