q-diffusion
q-diffusion copied to clipboard
Why this quantization model need more than 24GB GPU memory which is larger than ideal 500M?
1、Questions
As we Known, SD v1.5 has 1 Billions params , and it's peek GPU memory is about 4G at the precison fp32. So, the memory of int4 precison (sd_w4a8_chpt.pth) will be about 4G/8 = 500MB. However, when I load and run your w4a8 quantization models , the consumed GPU memory is more than 24GB, and we got a OOM finnaly!
2、my commands:
python txt2img.py --prompt "a puppet wearing a hat" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 5 --quant_act --act_bit 8 --sm_abit 16 --outdir ./data/ --cali_ckpt ../sd_w4a8_ckpt-001.pth
3、Error Logs:
07/31/2023 11:16:03 - INFO - root - Loading model from models/ldm/stable-diffusion-v1/model.ckpt
07/31/2023 11:16:04 - INFO - root - Global Step: 470000
07/31/2023 11:16:04 - INFO - torch.distributed.nn.jit.instantiator - Created a temporary directory at /tmp/tmpmwfx988m
07/31/2023 11:16:04 - INFO - torch.distributed.nn.jit.instantiator - Writing /tmp/tmpmwfx988m/_remote_module_non_scriptable.py
LatentDiffusion: Running in eps-prediction mode
07/31/2023 11:16:07 - INFO - ldm.util - DiffusionWrapper has 859.52 M params.
07/31/2023 11:16:07 - INFO - ldm.modules.diffusionmodules.model - making attention of type 'vanilla' with 512 in_channels
07/31/2023 11:16:07 - INFO - ldm.modules.diffusionmodules.model - Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
07/31/2023 11:16:07 - INFO - ldm.modules.diffusionmodules.model - making attention of type 'vanilla' with 512 in_channels
07/31/2023 11:16:12 - INFO - main - Not use gradient checkpointing for transformer blocks
Loading quantized model checkpoint
Initializing weight quantization parameters
07/31/2023 11:16:27 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:28 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:28 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:29 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:32 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:34 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:37 - INFO - qdiff.quant_layer - split at 1280!
07/31/2023 11:16:38 - INFO - qdiff.quant_layer - split at 640!
07/31/2023 11:16:39 - INFO - qdiff.quant_layer - split at 640!
07/31/2023 11:16:40 - INFO - qdiff.quant_layer - split at 640!
07/31/2023 11:16:41 - INFO - qdiff.quant_layer - split at 320!
07/31/2023 11:16:42 - INFO - qdiff.quant_layer - split at 320!
Initializing act quantization parameters
Traceback (most recent call last):
File "txt2img.py", line 444, in
32G V100 also gets OOM
40G A100 also OOM. Likely because of the fake quantization operations which require their own intermediate tensors allocated. You can reduce "n_samples" to counter this. For example n_samples=1 only needs 20GB.
For me, it's not effective. Even with n_samples=1 on 32GB V100, it still leads to OOM. This is the launch script: python scripts/txt2img.py --prompt "a puppet wearing a hat" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --quant_act --act_bit 8 --sm_abit 16 --outdir ./data/ --cali_ckpt models/sd_w4a8.pth --resume
For me, it's not effective. Even with n_samples=1 on 32GB V100, it still leads to OOM. This is the launch script: python scripts/txt2img.py --prompt "a puppet wearing a hat" --plms --cond --ptq --weight_bit 4 --quant_mode qdiff --no_grad_ckpt --split --n_samples 1 --quant_act --act_bit 8 --sm_abit 16 --outdir ./data/ --cali_ckpt models/sd_w4a8.pth --resume
I also encountered this problem. May I know how you solve it finally