quanto
quanto copied to clipboard
Why the quantized net is slower?
batch_size: 1, torch_dtype: fp32, unet_dtype: int8 in 3.754 seconds. Memory: 5.240GB.
batch_size: 1, torch_dtype: fp32, unet_dtype: None in 3.378 seconds. Memory: 6.073GB.
I'm using the example code for stable diffusion, but the inference time is slower for quantized int8 version (I've also test the speed on my own model, and quantization brings larger VRAM and slower inference time). Why is that case?
same observation
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
same observation in NVIDIA A30