stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Performance 6/6] Add --precision half option to avoid casting during inference
Description
According to https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/716#discussioncomment-9342622 , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.
This PR adds an option --precision half to disable autocasting and use all fp16 values during inference.
Screenshots/videos:
Checklist:
- [x] I have read contributing wiki page
- [x] I have performed a self-review of my own code
- [x] My code follows the style guidelines
- [x] My code passes tests
will force-fp16 mode conflicting with fp8 unet?
I'm not sure if this is related to using dynamic lora weight but I got this error
File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "H:\AItest\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 522, in network_Conv2d_forward
return originals.Conv2d_forward(self, input)
File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same
wonder if it's related to this https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/12205
Enabling --precision half breaks SD1.5 with the error mentioned here
Found the offending line. In ldm's openaimodel.py L795 in the UNetModel class we have:
h = x.type(self.dtype)
while in sgm it is simply:
# h = x.type(self.dtype)
h = x
self.dtype is set by constructing the model with use_fp16. When enabling force_fp16, we need to make sure to set the model's dtype to fp16. The fact that it works with SDXL is purely an accident due to the missing cast.
I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.
I don't know if it's the appropriate place to put it, but setting
use_fp16insd_models.repair_configfixed SD1.5 inference with this PR for me.
something like this?
def repair_config(sd_config):
if not hasattr(sd_config.model.params, "use_ema"):
sd_config.model.params.use_ema = False
if hasattr(sd_config.model.params, 'unet_config'):
if shared.cmd_opts.no_half:
sd_config.model.params.unet_config.params.use_fp16 = False
elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
sd_config.model.params.unet_config.params.use_fp16 = True
this does fixed dtype mismatch error
I don't know if it's the appropriate place to put it, but setting
use_fp16insd_models.repair_configfixed SD1.5 inference with this PR for me.something like this?
def repair_config(sd_config): if not hasattr(sd_config.model.params, "use_ema"): sd_config.model.params.use_ema = False if hasattr(sd_config.model.params, 'unet_config'): if shared.cmd_opts.no_half: sd_config.model.params.unet_config.params.use_fp16 = False elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half": sd_config.model.params.unet_config.params.use_fp16 = Truethis does fixed dtype mismatch error
Thanks for digging out the solution! Verified that the solution works.
I'm still getting the following runtime error with both SDXL and SD15 models:
File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
emb = self.time_embed(t_emb)
File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
input = module(input)
File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
return originals.Linear_forward(self, input)
File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
Seems to be related to --precision half. Anyone else getting this?
I'm still getting the following runtime error with both SDXL and SD15 models:
File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward emb = self.time_embed(t_emb) File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward input = module(input) File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward return originals.Linear_forward(self, input) File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 must have the same dtype, but got Float and HalfSeems to be related to
--precision half. Anyone else getting this?
Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.
Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.
Sure, I tried a few:
- anyloraCheckpoint_bakedvaeBlessedFp16.safetensors [ef49fbb25f]
- v1-5-pruned.safetensors [1a189f0be6]
- cyberrealisticPony_v20a.safetensors [41e77f7657]
Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:
- All extensions disabled aside from built-ins.
- Not using any LoRAs or extra networks.
- Tried a bunch of different samplers and schedulers.
- Using commandline args:
--precision half --ckpt-dir "S:/stable_diffusion/checkpoints" --lora-dir "S:/stable_diffusion/lora" - Installed via your bundle PR
I'll write back if I figure out the cause.
I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting --no-half-vae. On my card, running VAE in fp16 always produces black images. So, the only way to get correct images with --precision half is to enable NaN checks and rely on A1111 automatic fallback to fp32 VAE decoding, which negates some of the performance gains from this PR.
another report of fp8 issue
- https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/16122
with using FP16 VAE i got almost double speed compared to no-half-vae nice
FP16 VAE is mandatory