multidiffusion-upscaler-for-automatic1111 icon indicating copy to clipboard operation
multidiffusion-upscaler-for-automatic1111 copied to clipboard

Comparison discussion

Open x-legion opened this issue 1 year ago • 25 comments

MultiDiffusion Seems to be doing worse (not sharp) or am i doing something wrong? original: image

MultiDiffusion: image Ultimate SD Upscale: image

x-legion avatar Mar 07 '23 06:03 x-legion

Hello, would you please provide your weights (including the checkpoint & lora needed if you use lora) for your original image? I need them to reproduce your results in an oil-painting fashion. The MultiDiffusion results can be severely affected by the model checkpoints & lora you used.

But generally speaking, extraordinary high CFG Scale, and slightly higher denoising value will give you satisfying details. Example positive prompts are "highres, masterpiece, best quality, ultra-detailed unity 8k wallpaper, extremely clear, very clear, ultra-clear". You don't need anything concrete things in positive prompts; and then, drag the CFG Scale to an extra-large value. Denoising values between 0.1 and 0.4 are all OK but the content will change accordingly.

Here is my result of CFG=20, Sampler=DPM++ SDE Karras, denoising strength=0.3 for example. As I use the protogenX34 checkpoint, my painting style will be wildly different from yours:

00064-2792530863-20230307100606

Please comment on this issue if you find your results have significantly improved after you use proper model and CFG values.

pkuliyi2015 avatar Mar 07 '23 10:03 pkuliyi2015

Hi there, I will write here to not create new "issue" about similar thing. Would be possible to write down or picture all settings that were used to upscale picture attached in extension description ? I think I tested everything but only what I get is blurred upscaled picture. Here is one of example results that shows how blurry result is (not to mention about lack of extra details with denoise at 0.3 and CFG at 20 - as example). Atm. I want copy 1:1 everything to see if issue is on my side or what. Thanks for create that extension - have high hopes Example picture.

jurandfantom avatar Mar 09 '23 15:03 jurandfantom

Hello, as you wish I provide the PNG info: image

Here is the text version for your convenience. All resources are public things, but I'm quite busy and cannot provide your links.

masterpiece, best quality, highres, extremely detailed 8k unity wallpaper, ultra-detailed Negative prompt: EasyNegative Steps: 24, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 1614054406, Size: 4096x3200, Model hash: 2ccfc34fe3, Model: 0.9(Gf_style2) + 0.1(abyssorangemix2_Hard), Denoising strength: 0.4, Clip skip: 3, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 4, MultiDiffusion tile width: 128, MultiDiffusion tile height: 128, MultiDiffusion overlap: 64

If you don't know any of them, you can Google it. But your result is likely to come from pool positive and negative prompts, where I use a Textual Inversion called EasyNegative from civitai.com.

pkuliyi2015 avatar Mar 09 '23 17:03 pkuliyi2015

Click Here for Better Comparison View

original image

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3857533696, Size: 640x960, Model: dreamniji3fp16, Clip skip: 2, ENSD: 31337, Discard penultimate sigma: True

Ultimate SD upscaler image

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, Ultimate SD upscale upscaler: 4x_foolhardy_Remacri, Ultimate SD upscale tile_width: 768, Ultimate SD upscale tile_height: 768, Ultimate SD upscale mask_blur: 8, Ultimate SD upscale padding: 32, Discard penultimate sigma: True

MultiDiffusion image

masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 2, Discard penultimate sigma: True

x-legion avatar Mar 09 '23 18:03 x-legion

Ok, now I know it might be something wrong on my side. I can see additional details (will check its because of clip skip 3 or upscaler or what) but its still blurred. That super weird - ahh and thanks for reply. Attached pictures to description don't have infos attached (that why I ask :) ) 00147-1803174913

jurandfantom avatar Mar 09 '23 18:03 jurandfantom

https://imgsli.com/MTYwOTcx same here again

x-legion avatar Mar 09 '23 18:03 x-legion

Hello, thanks for your interests in this work. I tried for several minutes on your image and here is my result with no tuning: https://imgsli.com/MTYxMDI5.

It's hard to tell what is better; if you like illustration-style sharpness and faithfulness to the original image, may be Ultimate SD Upscaler + 4x Ultra Sharp is your best choice. But personally I'd like to see some fabricated details on realistic human face, so I prefer this tool.

It's noteworthy that, the biggest difference between MultiDiffusion and other upscalers is that currently it doesn't support any concrete contents when you upscale a image, otherwise each tile will contain a small character and your image finally becomes blur and messy.

The correct prompts is just as follows. I even don't use lora:

image

And my configurations, FYI:

image

pkuliyi2015 avatar Mar 09 '23 23:03 pkuliyi2015

I provide the PNG info

I tried to replicate your settings with an image provided by OP and it's still very blurry:

image

Compared to an image you sent:

image

As you can see, settings are pretty much the same except CFG scale:

image

DenkingOfficial avatar Mar 09 '23 23:03 DenkingOfficial

Update: Oh I just noticed that, EasyNegative is a textual inversion from civitai.com, it is not a word. Please download that textual inversion.

Here is the link: https://civitai.com/models/7808/easynegative

The Upscalers are important too. I personally use two: 4x-UltraSharp and 4x-remacri. Here is the link: https://upscale.wiki/wiki/Model_Database Where you can find the two upscalers and put it in your ESRGAN folder.

pkuliyi2015 avatar Mar 09 '23 23:03 pkuliyi2015

4x-remacri

I used it with the image above

EasyNegative is a textual inversion

Already downloaded this embedding

DenkingOfficial avatar Mar 09 '23 23:03 DenkingOfficial

4x-remacri

I used it with the image above

Do you use EasyNegative embeddings?

You mean you have used it in the above images?

pkuliyi2015 avatar Mar 09 '23 23:03 pkuliyi2015

You mean you have used it in the above images?

Yes, it was used

UPD:

image

DenkingOfficial avatar Mar 09 '23 23:03 DenkingOfficial

You mean you have used it in the above images?

Yes, it was used

UPD:

I spend some time to find the original PNG info. Here is it, please try to reproduce using my params: image

pkuliyi2015 avatar Mar 09 '23 23:03 pkuliyi2015

It may not be as easy as the Ultimate Upscaler to use, as it's essentially a completely redraw without post-processing. Personally I have some intuitions to use it:

  • No concrete positive prompts. Just something like clear, very clear, ultra clear
  • Don't use too large tile size as SD 1.4 is only good at 512 - 768 (so you divide it by 8 and get 64 - 96).
  • Large CFG Scales, Eular a & DPM++ SDE Karras, Denoising=0.2-0.4
  • Try both 4x-UltraSharp and 4x-Remacri
  • Clip Skip=2 or 3 worth to try.

pkuliyi2015 avatar Mar 09 '23 23:03 pkuliyi2015

please try to reproduce using my params

I just did it and it's a lot better

image

Settings (Even seed is the same):

image

But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)

My: image

And yours: image

DenkingOfficial avatar Mar 10 '23 00:03 DenkingOfficial

please try to reproduce using my params

I just did it and it's a lot better

image

Settings (Even seed is the same):

image

But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)

My: image

And yours: image

I'm also confused. Are you using this model?

https://civitai.com/models/3666/protogen-x34-photorealism-official-release

I see our model hash is different. Except from this I couldn't find something else.

pkuliyi2015 avatar Mar 10 '23 00:03 pkuliyi2015

I'm also confused. Are you using this model?

Yes, I used protogen_x3.4, but pruned Now I downloaded 5GB version with the same hash as your and THAT'S AMAZING

Very huge improvement in details:

image

It still not produces the exact same result as yours, I quess it depends on a hardware, but details are unbelievable, I can clearly see stitch seam on the sleeve

DenkingOfficial avatar Mar 10 '23 08:03 DenkingOfficial

Oh thanks for your feedback. I don't know that pruned model can affect the details too before you test it.

pkuliyi2015 avatar Mar 10 '23 09:03 pkuliyi2015

Ohh! I think not many knows that to be honest o_O As much as I understand pruning, it should not affect such task as upscalling via small tiles? I gonna try with not pruned model as well and let you know.

Edit. No clue but today everything works as it should. Maybe Its needed to turn off and on everything, not just to restart UI - just like during installing Dreambooth

jurandfantom avatar Mar 10 '23 09:03 jurandfantom

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

2blackbar avatar Mar 10 '23 14:03 2blackbar

More tests. ControlNet not work or it need way lower denoise than I used. Upscaling for attached was in two passes plus dynamic CFG script - agree, way to off from original picture, but now when i know what and where, its time for fine tunning (hopefully to figure out issue with control net). 00034-715773611 - Copy Indeed its essential to test couple upscalers because differences are huge - even bigger than used SD model.

jurandfantom avatar Mar 10 '23 15:03 jurandfantom

23,03,10 - 16,01,21 - 7331 a Left is my, right is pkuliyi2015 As you can see, left have way more details, but some noise and weird issues as well - pure remacri x4 looks almost like pkuliyi2015 version. Plenty of space for tests

jurandfantom avatar Mar 10 '23 16:03 jurandfantom

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

https://github.com/dustysys/ddetailer.git try this one

x-legion avatar Mar 11 '23 16:03 x-legion

tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale

I’m sorry for accidentally wrong edit.

This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.

But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.

pkuliyi2015 avatar Mar 11 '23 19:03 pkuliyi2015

The key point is that I need a user interface to draw bbox, so that you can draw rectangles and control the MultiDiffusion with different prompts. In this way the result should get way better.

Why? because in this way you can just select the woman's face and tell SD to draw a beautiful woman's face. Then the SD will try his best, using his 512 * 512 resolution to ONLY draw a face. The resolution will be unprecedentedly high for SD models, as he dedicated to draw only one part of the image at the best of his capabilities.

However, when I was adding features I saw this f**king issue: https://github.com/gradio-app/gradio/issues/2316

Some one pr a bbox tool but the officials denied the merging: https://github.com/gradio-app/gradio/pull/3220

I don't know what are they thinking in mind to deny such a good PR (from my perspective) but don't provide their own solutions. It has been a half year since it was first proposed.

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

pkuliyi2015 avatar Mar 12 '23 03:03 pkuliyi2015

So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?

Check out this extension: https://github.com/hnmr293/sd-webui-llul

It fakes it by having you move around a rectangle in a separate window.

image

ManOrMonster avatar Mar 12 '23 19:03 ManOrMonster

https://www.reddit.com/r/StableDiffusion/comments/11pyiro/new_feature_zoom_enhance_for_the_a111_webui/

New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!

Hello, fellow Stable Diffusion users! I'm excited to share with you a new feature that I've added to the Unprompted extension: it's the [zoom_enhance] shortcode.

If you're not familiar with Unprompted, it's a powerful extension that lets you use various shortcodes in your prompts to enhance your text generation experience. You can learn more about it here.

The [zoom_enhance] shortcode is inspired by the fictional technology from CSI, where they can magically zoom in on any pixelated image and reveal crisp details. Of course, this is not possible in real life, but we can get pretty close with Stable Diffusion and some clever tricks.

The shortcode allows you to automatically upscale small details within your image where Stable Diffusion tends to struggle. It is particularly good at fixing faces and hands in long-distance shots.

How does it work?

The [zoom_enhance] shortcode searches your image for specified target(s), crops out the matching regions and processes them through [img2img]. It then blends the result back into your original image. All of this happens behind-the-scenes without adding any unnecessary steps to your workflow. Just set it and forget it.

Features and Benefits

  • Great in both txt2img and img2img modes.
  • The shortcode is powered by the [txt2mask] implementation of clipseg, which means you can search for literally anything as a replacement target, and you get access to the full suite of [txt2mask] settings, such as "padding" and "negative_mask."
  • It's also pretty good at deepfakes. Set mask="face" and replacement="another person's face" and check out the results.
  • It applies a gaussian blur to the boundaries of the upscaled image which helps it blend seamlessly with the original.
  • It is equipped with Dynamic Denoising Strength which is based on a simple idea: the smaller your replacement target, the worse it probably looks. Think about it: when you generate a character who's far away from the camera, their face is often a complete mess. So, the shortcode will use a high denoising strength for small objects and a low strength for larger ones.
  • It is significantly faster than Hires Fix and won't mess up the rest of your image.
  • Compatible with A111's color correction setting.

How to use it?

To use this feature, you need to have Unprompted installed on your WebUI. If you don't have it yet, you can get it from here.

Once you have Unprompted, simply add this line anywhere in your prompt:

x-legion avatar Mar 13 '23 13:03 x-legion

I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.

The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img

pkuliyi2015 avatar Mar 13 '23 14:03 pkuliyi2015

How long does it take you to upgrade a photo, how can it be faster? Here are my settings image image image

Vuhiep190297 avatar Mar 13 '23 18:03 Vuhiep190297

I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.

The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img

Really impressive. Do you know about a user-friendly UI for the DDNM? multi-diffusion is a great idea btw.

gabriel-filincowsky avatar Mar 13 '23 20:03 gabriel-filincowsky