q-diffusion
q-diffusion copied to clipboard
Question about the inference process
Thank you for the cool job! After reading the paper and reproducing the result, I have a question regarding the inference part.
The inference of quantized model should be based on the quantized model, why should we load the FP32 model first?
Take txt2img.py
for example, why should we load the original FP32 model, i.e. sd-v1-4.ckpt
, then load the quantized model, i.e. sd_w8a8_ckpt.pth
to run inference?
The detailed implementation is in https://github.com/Xiuyu-Li/q-diffusion/blob/94fd0ecabc6e7545208c4809d84df091999ce4ad/scripts/txt2img.py#L311, which tries to load the full precision model.