stable-diffusion-webui [Performance 6/6] Add --precision half option to avoid casting during inference

Description

According to https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/716#discussioncomment-9342622 , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.

This PR adds an option --precision half to disable autocasting and use all fp16 values during inference.

Screenshots/videos:

328562287-8edf8f2c-3ee3-4fa0-af4b-7da3667a081e

Checklist:

[x] I have read contributing wiki page
[x] I have performed a self-review of my own code
[x] My code follows the style guidelines
[x] My code passes tests

May 17 '24 00:05 huchenlei

will force-fp16 mode conflicting with fp8 unet?

May 17 '24 03:05 SLAPaper

I'm not sure if this is related to using dynamic lora weight but I got this error

      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
        return forward_call(*args, **kwargs)
      File "H:\AItest\stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 522, in network_Conv2d_forward
        return originals.Conv2d_forward(self, input)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "H:\AItest\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

wonder if it's related to this https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/12205

May 17 '24 05:05 AG-w

Enabling --precision half breaks SD1.5 with the error mentioned here

May 17 '24 07:05 feffy380

Found the offending line. In ldm's openaimodel.py L795 in the UNetModel class we have:

        h = x.type(self.dtype)

while in sgm it is simply:

        # h = x.type(self.dtype)
        h = x

self.dtype is set by constructing the model with use_fp16. When enabling force_fp16, we need to make sure to set the model's dtype to fp16. The fact that it works with SDXL is purely an accident due to the missing cast.

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

May 17 '24 11:05 feffy380

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?

def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True

this does fixed dtype mismatch error

May 17 '24 12:05 AG-w

I don't know if it's the appropriate place to put it, but setting use_fp16 in sd_models.repair_config fixed SD1.5 inference with this PR for me.

something like this?
def repair_config(sd_config):

    if not hasattr(sd_config.model.params, "use_ema"):
        sd_config.model.params.use_ema = False

    if hasattr(sd_config.model.params, 'unet_config'):
        if shared.cmd_opts.no_half:
            sd_config.model.params.unet_config.params.use_fp16 = False
        elif shared.cmd_opts.upcast_sampling or shared.cmd_opts.precision == "half":
            sd_config.model.params.unet_config.params.use_fp16 = True
this does fixed dtype mismatch error

Thanks for digging out the solution! Verified that the solution works.

May 17 '24 17:05 huchenlei

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

May 17 '24 19:05 ThereforeGames

I'm still getting the following runtime error with both SDXL and SD15 models:

      File "T:\code\python\automatic-stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 215, in forward
        input = module(input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
        return self._call_impl(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
        return forward_call(*args, **kwargs)
      File "T:\code\python\automatic-stable-diffusion-webui\extensions-builtin\Lora\networks.py", line 508, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "T:\code\python\automatic-stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half

Seems to be related to --precision half. Anyone else getting this?

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

May 17 '24 20:05 huchenlei

Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision.

Sure, I tried a few:

anyloraCheckpoint_bakedvaeBlessedFp16.safetensors [ef49fbb25f]
v1-5-pruned.safetensors [1a189f0be6]
cyberrealisticPony_v20a.safetensors [41e77f7657]

Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:

All extensions disabled aside from built-ins.
Not using any LoRAs or extra networks.
Tried a bunch of different samplers and schedulers.
Using commandline args: --precision half --ckpt-dir "S:/stable_diffusion/checkpoints" --lora-dir "S:/stable_diffusion/lora"
Installed via your bundle PR

I'll write back if I figure out the cause.

May 17 '24 20:05 ThereforeGames

I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting --no-half-vae. On my card, running VAE in fp16 always produces black images. So, the only way to get correct images with --precision half is to enable NaN checks and rely on A1111 automatic fallback to fp32 VAE decoding, which negates some of the performance gains from this PR.

May 18 '24 08:05 Arvamer

another report of fp8 issue

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/16122

Jul 02 '24 04:07 w-e-w

with using FP16 VAE i got almost double speed compared to no-half-vae nice

FP16 VAE is mandatory

Jul 31 '24 12:07 FurkanGozukara

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Performance 6/6] Add --precision half option to avoid casting during inference

Description

Screenshots/videos:

Checklist:

stable-diffusion-webui
stable-diffusion-webui copied to clipboard