Activation quantization of Stable Diffusion.
The weight quantization of sd does not have any loss of precision (W8A16), but even if the activation value uses dynamic quantization (W8A8), it will have a huge impact on the quality of the generated picture.
In your paper, you mentioned that you used brecq to reconstruct the transformer block and resnet block respectively, but in my experiment, the reconstruction loss did not converge.
Do you have any good suggestions?
Hi,
Have you solved this issue? I found that activation quantization generally does not converge, and sometimes it even performs worse after training than before training.
For example: 10/12/2024 01:05:28 - INFO - main - Reconstruction for layer conv 10/12/2024 01:05:39 - INFO - qdiff.utils - in shape: torch.Size([5120, 256, 17, 17]) 10/12/2024 01:05:39 - INFO - qdiff.utils - out shape: torch.Size([5120, 256, 8, 8]) 10/12/2024 01:05:41 - INFO - qdiff.layer_recon - Total loss: 0.291 (rec:0.291, round:0.000) b=0.00 count=500 10/12/2024 01:05:44 - INFO - qdiff.layer_recon - Total loss: 0.281 (rec:0.281, round:0.000) b=0.00 count=1000 10/12/2024 01:05:46 - INFO - qdiff.layer_recon - Total loss: 0.218 (rec:0.218, round:0.000) b=0.00 count=1500 10/12/2024 01:05:48 - INFO - qdiff.layer_recon - Total loss: 0.296 (rec:0.296, round:0.000) b=0.00 count=2000 10/12/2024 01:05:51 - INFO - qdiff.layer_recon - Total loss: 0.221 (rec:0.221, round:0.000) b=0.00 count=2500 10/12/2024 01:05:53 - INFO - qdiff.layer_recon - Total loss: 0.260 (rec:0.260, round:0.000) b=0.00 count=3000 10/12/2024 01:05:55 - INFO - qdiff.layer_recon - Total loss: 0.223 (rec:0.223, round:0.000) b=0.00 count=3500 10/12/2024 01:05:57 - INFO - qdiff.layer_recon - Total loss: 0.233 (rec:0.233, round:0.000) b=0.00 count=4000 10/12/2024 01:06:00 - INFO - qdiff.layer_recon - Total loss: 0.291 (rec:0.291, round:0.000) b=0.00 count=4500 10/12/2024 01:06:02 - INFO - qdiff.layer_recon - Total loss: 0.314 (rec:0.314, round:0.000) b=0.00 count=5000