quanto icon indicating copy to clipboard operation
quanto copied to clipboard

Why the quantized net is slower?

Open theguardsgod opened this issue 10 months ago • 1 comments

batch_size: 1, torch_dtype: fp32, unet_dtype: int8 in 3.754 seconds. Memory: 5.240GB.

batch_size: 1, torch_dtype: fp32, unet_dtype: None in 3.378 seconds. Memory: 6.073GB.

I'm using the example code for stable diffusion, but the inference time is slower for quantized int8 version (I've also test the speed on my own model, and quantization brings larger VRAM and slower inference time). Why is that case?

theguardsgod avatar Apr 20 '24 07:04 theguardsgod

same observation

canamika27 avatar May 07 '24 11:05 canamika27

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jun 07 '24 01:06 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jun 13 '24 01:06 github-actions[bot]

same observation in NVIDIA A30

newgrit1004 avatar Jul 18 '24 23:07 newgrit1004