multidiffusion-upscaler-for-automatic1111
multidiffusion-upscaler-for-automatic1111 copied to clipboard
Comparison discussion
MultiDiffusion Seems to be doing worse (not sharp) or am i doing something wrong?
original:
MultiDiffusion:
Ultimate SD Upscale:
Hello, would you please provide your weights (including the checkpoint & lora needed if you use lora) for your original image? I need them to reproduce your results in an oil-painting fashion. The MultiDiffusion results can be severely affected by the model checkpoints & lora you used.
But generally speaking, extraordinary high CFG Scale, and slightly higher denoising value will give you satisfying details. Example positive prompts are "highres, masterpiece, best quality, ultra-detailed unity 8k wallpaper, extremely clear, very clear, ultra-clear". You don't need anything concrete things in positive prompts; and then, drag the CFG Scale to an extra-large value. Denoising values between 0.1 and 0.4 are all OK but the content will change accordingly.
Here is my result of CFG=20, Sampler=DPM++ SDE Karras, denoising strength=0.3 for example. As I use the protogenX34 checkpoint, my painting style will be wildly different from yours:
Please comment on this issue if you find your results have significantly improved after you use proper model and CFG values.
Hi there, I will write here to not create new "issue" about similar thing. Would be possible to write down or picture all settings that were used to upscale picture attached in extension description ? I think I tested everything but only what I get is blurred upscaled picture. Here is one of example results that shows how blurry result is (not to mention about lack of extra details with denoise at 0.3 and CFG at 20 - as example). Atm. I want copy 1:1 everything to see if issue is on my side or what. Thanks for create that extension - have high hopes Example picture.
Hello, as you wish I provide the PNG info:
Here is the text version for your convenience. All resources are public things, but I'm quite busy and cannot provide your links.
masterpiece, best quality, highres, extremely detailed 8k unity wallpaper, ultra-detailed Negative prompt: EasyNegative Steps: 24, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 1614054406, Size: 4096x3200, Model hash: 2ccfc34fe3, Model: 0.9(Gf_style2) + 0.1(abyssorangemix2_Hard), Denoising strength: 0.4, Clip skip: 3, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 4, MultiDiffusion tile width: 128, MultiDiffusion tile height: 128, MultiDiffusion overlap: 64
If you don't know any of them, you can Google it. But your result is likely to come from pool positive and negative prompts, where I use a Textual Inversion called EasyNegative from civitai.com.
Click Here for Better Comparison View
original
masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3857533696, Size: 640x960, Model: dreamniji3fp16, Clip skip: 2, ENSD: 31337, Discard penultimate sigma: True
Ultimate SD upscaler
masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, Ultimate SD upscale upscaler: 4x_foolhardy_Remacri, Ultimate SD upscale tile_width: 768, Ultimate SD upscale tile_height: 768, Ultimate SD upscale mask_blur: 8, Ultimate SD upscale padding: 32, Discard penultimate sigma: True
MultiDiffusion
masterpiece, best quality, portrait,
blue fire, silver hair, fox girl, mage, arm extended, holding blue fire, by jordan grimmer and greg rutkowski and pine ハイネ wlop, intricate, beautiful, trending artstation, pixiv, digital art, anime, no torch,
<lora:Noise:1.75>
Negative prompt: EasyNegative, lowres, ((bad anatomy)), ((bad hands)), text, missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, low quality, normal quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts))
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 3857533696, Size: 1280x1920, Model: dreamniji3fp16, Denoising strength: 0.4, Clip skip: 2, ENSD: 31337, Mask blur: 4, MultiDiffusion upscaler: 4x_foolhardy_Remacri, MultiDiffusion scale factor: 2, Discard penultimate sigma: True
Ok, now I know it might be something wrong on my side. I can see additional details (will check its because of clip skip 3 or upscaler or what) but its still blurred. That super weird - ahh and thanks for reply. Attached pictures to description don't have infos attached (that why I ask :) )
https://imgsli.com/MTYwOTcx same here again
Hello, thanks for your interests in this work. I tried for several minutes on your image and here is my result with no tuning: https://imgsli.com/MTYxMDI5.
It's hard to tell what is better; if you like illustration-style sharpness and faithfulness to the original image, may be Ultimate SD Upscaler + 4x Ultra Sharp is your best choice. But personally I'd like to see some fabricated details on realistic human face, so I prefer this tool.
It's noteworthy that, the biggest difference between MultiDiffusion and other upscalers is that currently it doesn't support any concrete contents when you upscale a image, otherwise each tile will contain a small character and your image finally becomes blur and messy.
The correct prompts is just as follows. I even don't use lora:

And my configurations, FYI:

I provide the PNG info
I tried to replicate your settings with an image provided by OP and it's still very blurry:
Compared to an image you sent:
As you can see, settings are pretty much the same except CFG scale:
Update: Oh I just noticed that, EasyNegative is a textual inversion from civitai.com, it is not a word. Please download that textual inversion.
Here is the link: https://civitai.com/models/7808/easynegative
The Upscalers are important too. I personally use two: 4x-UltraSharp and 4x-remacri. Here is the link: https://upscale.wiki/wiki/Model_Database Where you can find the two upscalers and put it in your ESRGAN folder.
4x-remacri
I used it with the image above
EasyNegative is a textual inversion
Already downloaded this embedding
4x-remacri
I used it with the image above
Do you use EasyNegative embeddings?
You mean you have used it in the above images?
You mean you have used it in the above images?
Yes, it was used
UPD:
You mean you have used it in the above images?
Yes, it was used
UPD:
I spend some time to find the original PNG info. Here is it, please try to reproduce using my params:
It may not be as easy as the Ultimate Upscaler to use, as it's essentially a completely redraw without post-processing. Personally I have some intuitions to use it:
- No concrete positive prompts. Just something like clear, very clear, ultra clear
- Don't use too large tile size as SD 1.4 is only good at 512 - 768 (so you divide it by 8 and get 64 - 96).
- Large CFG Scales, Eular a & DPM++ SDE Karras, Denoising=0.2-0.4
- Try both 4x-UltraSharp and 4x-Remacri
- Clip Skip=2 or 3 worth to try.
please try to reproduce using my params
I just did it and it's a lot better
Settings (Even seed is the same):
But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)
My:
And yours:
please try to reproduce using my params
I just did it and it's a lot better
Settings (Even seed is the same):
But still it can't generate a result as good as yours I know it highly depends on a hardware, but there's a very large difference in details No any optimizations used (Such as xformers, opt-split-attention etc.)
My:
And yours:
I'm also confused. Are you using this model?
https://civitai.com/models/3666/protogen-x34-photorealism-official-release
I see our model hash is different. Except from this I couldn't find something else.
I'm also confused. Are you using this model?
Yes, I used protogen_x3.4, but pruned Now I downloaded 5GB version with the same hash as your and THAT'S AMAZING
Very huge improvement in details:
It still not produces the exact same result as yours, I quess it depends on a hardware, but details are unbelievable, I can clearly see stitch seam on the sleeve
Oh thanks for your feedback. I don't know that pruned model can affect the details too before you test it.
Ohh! I think not many knows that to be honest o_O As much as I understand pruning, it should not affect such task as upscalling via small tiles? I gonna try with not pruned model as well and let you know.
Edit. No clue but today everything works as it should. Maybe Its needed to turn off and on everything, not just to restart UI - just like during installing Dreambooth
tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale
More tests. ControlNet not work or it need way lower denoise than I used.
Upscaling for attached was in two passes plus dynamic CFG script - agree, way to off from original picture, but now when i know what and where, its time for fine tunning (hopefully to figure out issue with control net).
Indeed its essential to test couple upscalers because differences are huge - even bigger than used SD model.
Left is my, right is pkuliyi2015
As you can see, left have way more details, but some noise and weird issues as well - pure remacri x4 looks almost like pkuliyi2015 version. Plenty of space for tests
tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale
This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.
But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.
https://github.com/dustysys/ddetailer.git try this one
tried it and to be honest esrgan upscalers do 99% of the lifting, it barely does anything when used with lanczos, unless theres gonna be examples of it with lanczos where it introduces new details ? Best bet is to just upscale with esrgan by 2 and go to inpaint with it to mask the parts one by one to upscale them since you gonna have more pixel area to resolve detail, so unless someone will automate that , its gonna stay as the best way to upscale
I’m sorry for accidentally wrong edit.
This is basically a tile-by-tile img2img SD redraw. So if you don't give it high strength it doesn't work as you expected. However, one of the weakness is that it currently cannot automatically map your prompts to different areas... If you can use stronger prompts, it should be way better.
But I'm working on Automatic Prompt Mapping. In img2img, it works by first estimate the attention map of your prompt to the original picture, and then re-apply them to multidiffusion tiles. In txt2img this may be similar, but I need time to do so.
The key point is that I need a user interface to draw bbox, so that you can draw rectangles and control the MultiDiffusion with different prompts. In this way the result should get way better.
Why? because in this way you can just select the woman's face and tell SD to draw a beautiful woman's face. Then the SD will try his best, using his 512 * 512 resolution to ONLY draw a face. The resolution will be unprecedentedly high for SD models, as he dedicated to draw only one part of the image at the best of his capabilities.
However, when I was adding features I saw this f**king issue: https://github.com/gradio-app/gradio/issues/2316
Some one pr a bbox tool but the officials denied the merging: https://github.com/gradio-app/gradio/pull/3220
I don't know what are they thinking in mind to deny such a good PR (from my perspective) but don't provide their own solutions. It has been a half year since it was first proposed.
So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?
So it will be hard to draw rectangles on images directly. I must find another way to draw rectangles. Do you have any other idea?
Check out this extension: https://github.com/hnmr293/sd-webui-llul
It fakes it by having you move around a rectangle in a separate window.

https://www.reddit.com/r/StableDiffusion/comments/11pyiro/new_feature_zoom_enhance_for_the_a111_webui/
New Feature: "ZOOM ENHANCE" for the A111 WebUI. Automatically fix small details like faces and hands!
Hello, fellow Stable Diffusion users! I'm excited to share with you a new feature that I've added to the Unprompted extension: it's the [zoom_enhance]
shortcode.
If you're not familiar with Unprompted, it's a powerful extension that lets you use various shortcodes in your prompts to enhance your text generation experience. You can learn more about it here.
The [zoom_enhance]
shortcode is inspired by the fictional technology from CSI, where they can magically zoom in on any pixelated image and reveal crisp details. Of course, this is not possible in real life, but we can get pretty close with Stable Diffusion and some clever tricks.
The shortcode allows you to automatically upscale small details within your image where Stable Diffusion tends to struggle. It is particularly good at fixing faces and hands in long-distance shots.
How does it work?
The [zoom_enhance]
shortcode searches your image for specified target(s), crops out the matching regions and processes them through [img2img]
. It then blends the result back into your original image. All of this happens behind-the-scenes without adding any unnecessary steps to your workflow. Just set it and forget it.
Features and Benefits
- Great in both
txt2img
andimg2img
modes. - The shortcode is powered by the
[txt2mask]
implementation of clipseg, which means you can search for literally anything as a replacement target, and you get access to the full suite of[txt2mask]
settings, such as "padding" and "negative_mask." - It's also pretty good at deepfakes. Set
mask="face"
andreplacement="another person's face"
and check out the results. - It applies a gaussian blur to the boundaries of the upscaled image which helps it blend seamlessly with the original.
- It is equipped with Dynamic Denoising Strength which is based on a simple idea: the smaller your replacement target, the worse it probably looks. Think about it: when you generate a character who's far away from the camera, their face is often a complete mess. So, the shortcode will use a high denoising strength for small objects and a low strength for larger ones.
- It is significantly faster than Hires Fix and won't mess up the rest of your image.
- Compatible with A111's color correction setting.
How to use it?
To use this feature, you need to have Unprompted installed on your WebUI. If you don't have it yet, you can get it from here.
Once you have Unprompted, simply add this line anywhere in your prompt:
I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.
The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img
How long does it take you to upgrade a photo, how can it be faster? Here are my settings
I have investigated a new technology DDNM (https://github.com/wyhuai/DDNM) that is very powerful in super-resolution. And it is also compatible with MultiDiffusion. Through initial test I found it is amazing. I believe this can beat their new feature in a compelling way.
The automatic mask technology seems not very compatible with multi-diffusion txt2img but I will try in img2img
Really impressive. Do you know about a user-friendly UI for the DDNM? multi-diffusion is a great idea btw.