stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Use Spandrel for upscaling and face restoration architectures

Open akx opened this issue 1 year ago • 1 comments

Description

This PR yeets most of the copy-pasted or otherwise vendored model architectures in favor of just using Spandrel.

  • Not converted is LDSR; it doesn't exist in Spandrel.
  • There's still some more cleanup that could be done – there are multiple implementations of tiled inference right now, for one, and the model loading/downloading/... code is kind of a mess (should continue where I left off with #10823), but I'll hold off on that for this PR.
  • A follow-up PR would add support for HAT models in about 42 lines of code. (Got a POC already.)

Screenshots/videos:

No visual changes. This seems to Work On My Machine but it'd be lovely if someone else tried this out too.

Checklist:

akx avatar Dec 25 '23 13:12 akx

Oh yeah, I've been using this PR for a couple days now; it works.

gel-crabs avatar Dec 28 '23 23:12 gel-crabs

@gel-crabs Thanks for trying it out! I (force-)pushed this branch to update spandrel to a newer version, as well as add experimental support for HAT upscalers, if you want to try that out. (You'll need to bring your own models and put them in models/HAT/.)

akx avatar Dec 29 '23 10:12 akx

@gel-crabs Thanks for trying it out! I (force-)pushed this branch to update spandrel to a newer version, as well as add experimental support for HAT upscalers, if you want to try that out. (You'll need to bring your own models and put them in models/HAT/.)

It works! Admittedly it has issues with deepcache where it adds black splotches to the image during hires fix, but otherwise working.

I tried to hack in support for DAT as well by copying hat_model.py and replacing HAT with DAT, but it just made the image go full black.

Edit: It actually has nothing to do with deepcache, or any extensions at all. I'm going to try testing with different models.

I tried with a different 4x HAT upscaler and it gives full black images, so the HAT support doesn't seem to be working correctly.

gel-crabs avatar Dec 29 '23 21:12 gel-crabs

I'm generally not pumped about adding new dependencies, but this removes a lot of code we just copy pasted, so that seems nice.

Some questions:

  • what's with __init__.py?
  • what's with commented code in webui.py?
  • for tests, on the new machine (which is always the case for github servers), it looks to me that it will download the model. Maybe those testscould be disabled by default? Also since you're not actually checking any changes in faces, we could reuse the existing img2img_basic.png instead of adding a new pic.
  • what happens when you put a checkpoint in a wrong dir? Say, ESRGAN checkpoint into swinir dir. Or a codeformer model into ESRGAN dir?
  • did you test all models you converted to use spandrel?

AUTOMATIC1111 avatar Dec 30 '23 11:12 AUTOMATIC1111

I'm generally not pumped about adding new dependencies, but this removes a lot of code we just copy pasted, so that seems nice.

I think this actually leads to less dependencies in total (I'll run the numbers later). The Spandrel folks seem nice and responsive too. :)

  • what's with __init__.py?

Autogenerated by PyCharm when refactoring code. Will yeet, my bad.

  • what's with commented code in webui.py?

Also accidentally added to this PR (since I was tired of having a gazillion WebUI tabs get auto-opened), my bad. Will yeet.

  • for tests, on the new machine (which is always the case for github servers), it looks to me that it will download the model.

I can also add an actions/cache action so we cache the models/ directory (like Spandrel's tests do).

Also since you're not actually checking any changes in faces, we could reuse the existing img2img_basic.png instead of adding a new pic.

Since we do facexlib to detect faces and only act on the face patches, using an image that doesn't have any faces will not exercise the code that would actually run the Spandrel model 😁

I'll add a simple "output image was different" check!

  • what happens when you put a checkpoint in a wrong dir? Say, ESRGAN checkpoint into swinir dir. Or a codeformer model into ESRGAN dir?

Good question - since Spandrel auto-detects the model arch from the checkpoint, it'd happily load it, and maybe fail with a parameter error down the line when we try to call the architecture with kwargs it doesn't get. I can add isinstance checks to see we loaded the correct model (and warn and fail if so) instead of just blindly forging ahead.

  • did you test all models you converted to use spandrel?

I did, on my machine (Macbook).

akx avatar Dec 30 '23 13:12 akx

Looks like SwinIR x2 is not working now. I get this in any model:

File "...\modules\images.py", line 286, in resize_image
  res = resize(im, width, height)
File "...\modules\images.py", line 278, in resize
  im = upscaler.scaler.upscale(im, scale, upscaler.data_path)
File "...\modules\upscaler.py", line 65, in upscale
  img = self.do_upscale(img, selected_model)
File "...\extensions-builtin\SwinIR\scripts\swinir_model.py", line 48, in do_upscale
  img = upscaler_utils.upscale_2(
File "...\modules\upscaler_utils.py", line 181, in upscale_2
  output = tiled_upscale_2(
File "...\modules\upscaler_utils.py", line 149, in tiled_upscale_2
  ].add_(out_patch)
RuntimeError: The size of tensor a (2560) must match the size of tensor b (1280) at non-singleton dimension 3

wcde avatar Jan 03 '24 10:01 wcde

@wcde Thanks, I'll take a peek – what's your SwinIR tile size and overlap setting, and the size of the image you're trying to upscale?

akx avatar Jan 03 '24 12:01 akx

In code hardcoded scale to 4. Should be something like that:

img = upscaler_utils.upscale_2(
    img,
    model,
    tile_size=shared.opts.SWIN_tile,
    tile_overlap=shared.opts.SWIN_tile_overlap,
    scale=model.scale,
    desc="SwinIR",
)

Second problem - model is loaded with dtype devices.dtype, but in upscale_2 input casted to fp32:

tensor = pil_image_to_torch_bgr(img).float()

Which give:

RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

wcde avatar Jan 03 '24 19:01 wcde

@wcde In fairness, scale has always been hard-coded to 4 unless I overlooked something: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/cf2772fab0af5573da775e7437e6acdca424f26e/extensions-builtin/SwinIR/scripts/swinir_model.py#L63

I'll take a look at the half issue, thanks for pointing it out.

akx avatar Jan 03 '24 20:01 akx

I guess it will happen with a lot of extensions after updating. Maybe it should be mentioned in changelog?

light-and-ray avatar Jan 28 '24 04:01 light-and-ray