InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

[bug]: Tiled decoding ruins the image

Open seinan9 opened this issue 1 year ago • 7 comments
trafficstars

Is there an existing issue for this problem?

  • [X] I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

RTX 4060 TI

GPU VRAM

16GB

Version number

4.0.2

Browser

Google Chrome 123.0.6312.105

Python dependencies

{ "accelerate": "0.28.0", "compel": "2.0.2", "cuda": "12.1", "diffusers": "0.27.2", "numpy": "1.26.4", "opencv": "4.9.0.80", "onnx": "1.15.0", "pillow": "10.3.0", "python": "3.10.6", "torch": "2.2.1+cu121", "torchvision": "0.17.1+cu121", "transformers": "4.39.1", "xformers": "0.0.25" }

What happened

The image quality is significantly worse when setting force_tiled_decode to true. This is slightly noticable during the initial generation, and much more when upscaling. Certain parts look oversaturated. One can observe clear difference in the images bellow (first with tiled decode off, second with tiled decode on). Also on the second image, one can see that the upper part (probably the first tile) is less affected than the bottom part (probably the second tile).

tiled_decode_off tiled_decode_on

What you expected to happen

I expected the images to be decoded normally without oversaturation on certain parts.

How to reproduce the problem

Set force_tiled_decode option in invokeai.yaml to true, start the application and generate an image (easier to spot with realistic ones and upscaling).

Additional context

The examples are generated with epicrealism (SD 1.5), but this is reproducible with other models as well. In realistic ones it is easier to spot. This does not happen in InvokeAI version 3.7.

The initial image was generated with the following parameters:

  • Generation Mode: txt2img
  • Positive Prompt: photo of a viking, short hair, oversized sweater, close up, fierce, male
  • Negative Prompt: (low quality)1.4
  • Model: epicrealism (SD-1)
  • Width: 512
  • Height: 768
  • Seed: 2926731161
  • Steps: 25
  • Scheduler: dpmpp_2m_k
  • CFG scale: 8
  • CFG Rescale Multiplier: 0

Afterwards it was upscaled to 640x960 via img2img with a denoise of .55. The parameters stayed the same.

Discord username

seinan9

seinan9 avatar Apr 04 '24 18:04 seinan9

The handling of tiled decode hasn't changed in some time - several months. This functionality is handled wholly by diffusers, and it appears their implementation also hasn't changed in months.

It's possible there was some change in another area of diffusers or invoke that indirectly affect how tiled decoding is handled.

However, slight changes like this are known effects of tiled decoding. The model doesn't have the full context of the image, it's expected that the tiled decode has measurable and sometimes visual differences. There's some discussion here, though the example images appear to be missing now.

A more convincing comparison would be between a v3.7.0 image with tiled decode vs v4 image with tiled decode (no upscaling please, that adds another variable to the equation). Ideally a few comparisons.

psychedelicious avatar Apr 05 '24 04:04 psychedelicious

It is less visible during the first pass, since there is typically only a single tile (2 at most if the resolution is set a bit higher). Still here are two more example without upscaling (512x768). Images 1 and 3 were generated via InvokeAI 3.7, while 2 and 4 were generated using InvokeAI 4.0.2. Same parameters for alle images.

invoke37_tiled_decode_on_0 invoke402_tiled_decode_on_0 invoke37_tiled_decode_on_1 invoke402_tiled_decode_on_1

seinan9 avatar Apr 05 '24 12:04 seinan9

Thanks for those examples. It's still very noticeable. I think we need to test this with diffusers (i.e. via separate script, not within invoke) to confirm where the problem is.

psychedelicious avatar Apr 05 '24 20:04 psychedelicious

Your welcome. And thank you for looking into it!

seinan9 avatar Apr 05 '24 21:04 seinan9

I tried to reproduce this today. It turns out that there was no regression in VAE tiling behavior. There was a period of time during the switch from tiled_decode to force_tiled_decode during which we weren't applying the force_tiled_decode config.

For example, look at the v3.6.2 tag:

  • force_tiled_decode was present in the config and tiled_decode was deprecated: https://github.com/invoke-ai/InvokeAI/blob/v3.6.2/invokeai/app/services/config/config_default.py#L272
  • But, we were still using tiled_decode in the codebase: https://github.com/invoke-ai/InvokeAI/blob/v3.6.2/invokeai/app/invocations/latent.py#L857

This was eventually fixed in https://github.com/invoke-ai/InvokeAI/commit/897fe497dc70012cdd2680ca9a297f35545f7817.

I tested VAE tiling in older versions of Invoke via workflows and saw the same bad VAE tiling artifacts as in the latest version of Invoke. Unfortunately, these tiling artifacts are expected in the current diffusers implementation of VAE tiling, as discussed on the original PR: https://github.com/huggingface/diffusers/pull/1441

I'm going to do a little experimentation to see if I can improve things by modifying the tile dimensions/overlaps. But a proper fix would be a bigger project.

RyanJDick avatar Jun 26 '24 14:06 RyanJDick

Unfortunate that it is a problem within the diffusers implementation. For me it is not an urget problem, but I am still grateful that you are looking into it. Thanks!

seinan9 avatar Jun 27 '24 17:06 seinan9

I had the same issue with EasyDiffusion last year. They added a switch to disable VAE tiling. It seems that's the only fix right now. https://github.com/easydiffusion/easydiffusion/issues/1442

ufuksarp avatar Jun 27 '24 18:06 ufuksarp