InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

[enhancement]: Suggestion to improve the Hires Fix logic for SD 1.5.

Open StellarBeing25 opened this issue 1 year ago • 12 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Contact Details

No response

What should this feature add?

When I generate images at 1024x1536 (2:3), the initial image is generated at 416 x 624. Here, the resolution of smaller side goes way below 512 pixels, which is not an ideal resolution for SD 1.5 finetunes since the majority are trained on images above 512 pixels and can work perfectly fine up to 768 x 768. Please modify the Hires Fix logic to generate the initial image with a minimum resolution of 512 on the smaller side. For example, if the dimensions specified by the user are 1024 x 1536 (2:3), the resolution of the initial image generated should be 512 x 768. Fallback to the current logic only when the larger side of the initial image exceeds 768 pixels, which will only happen at some aspect ratios like 16:9 or 1:2 that are quite uncommon in SD community.

Alternatives

No response

Additional Content

No response

StellarBeing25 avatar Jul 03 '24 10:07 StellarBeing25

This makes sense.

hipsterusername avatar Jul 03 '24 12:07 hipsterusername

I'd just like to see the "Multiplier" field from the Ideal Size node exposed in the linear UI, rather than forcing a specific initial size. In this case, I believe setting the minimum dimension to 512 is not necessarily ideal, but I also agree that the current behavior is often not either.

Just adding that multiplier input would let people scale it up or down to fit whatever composition and aspect ratio they are working with.

dwringer avatar Jul 03 '24 13:07 dwringer

Setting the minimum dimension to 512 produces duplication and other artifacts when your target image is extra wide, e.g. 2.4:1 aspect. Exposing the multiplier makes far more sense to me so one can do that if they wish, but the default calculations still make sense to avoid those artifacts. The premise that fine-tunes were done with higher resolution imagery is largely irrelevant since the 1.5 models have a native resolution of 512x512; the composition of images in training data is what allows some models to function better at higher resolution initial generations.

JPPhoto avatar Jul 03 '24 13:07 JPPhoto

Setting the minimum dimension to 512 produces duplication and other artifacts when your target image is extra wide, e.g. 2.4:1 aspect. Exposing the multiplier makes far more sense to me so one can do that if they wish, but the default calculations still make sense to avoid those artifacts. The premise that fine-tunes were done with higher resolution imagery is largely irrelevant since the 1.5 models have a native resolution of 512x512; the composition of images in training data is what allows some models to function better at higher resolution initial generations.

In the case of such aspect ratios, where the larger side of the initial image needs to exceed 768 pixels in order to keep the smaller side at 512, invokeAI could fall back to the current logic as suggested above. Such wide resolutions are also quite uncommon in the SD community. Moreover, many SD 1.5 Fine-Tuners suggest staying between 512 and 768 to get the best output from their models. 

StellarBeing25 avatar Jul 03 '24 14:07 StellarBeing25

How about we stop holding the user’s hand and go back to the old UI maybe with some new visual indicators of ideal sizes?

ufuksarp avatar Jul 03 '24 14:07 ufuksarp

In the case of such aspect ratios, where the larger side of the initial image needs to exceed 768 pixels in order to keep the smaller side at 512, invokeAI could fall back to the current logic as suggested above. Such wide resolutions are also quite uncommon in the SD community. Moreover, many SD 1.5 Fine-Tuners suggest staying between 512 and 768 to get the best output from their models.

This problem is already solved with the multiplier option of the node. It simply needs to be exposed in the UI - or you could of course use the workflow from an existing image and alter it to your liking by using the multiplier or calculating values you like better or hardcoding things in.

I don't see a reason that 768 should be the hard point at which logic shifts. There's nothing about the model that indicates 768 is a special number to use. Thus, I believe this change to the default workflows or the base node would be detrimental to most users and generations as it completely depends on your model and subject.

JPPhoto avatar Jul 03 '24 15:07 JPPhoto

In the case of such aspect ratios, where the larger side of the initial image needs to exceed 768 pixels in order to keep the smaller side at 512, invokeAI could fall back to the current logic as suggested above. Such wide resolutions are also quite uncommon in the SD community. Moreover, many SD 1.5 Fine-Tuners suggest staying between 512 and 768 to get the best output from their models.

This problem is already solved with the multiplier option of the node. It simply needs to be exposed in the UI - or you could of course use the workflow from an existing image and alter it to your liking by using the multiplier or calculating values you like better or hardcoding things in.

I don't see a reason that 768 should be the hard point at which logic shifts. There's nothing about the model that indicates 768 is a special number to use. Thus, I believe this change to the default workflows or the base node would be detrimental to most users and generations as it completely depends on your model and subject.

I don't understand why they don't simply expose the resolution multiplier in the Hires fix UI instead of using such complex logic to make it even more confusing to the user. Using controlnet becomes confusing since it is recommended to use controlnet images of the same resolution as the initial image generated, which is hidden from the user. The image generated from the same seed also becomes completely different when Hires fix is enabled. It will become even more difficult for the developers in the future to implement Hires fix for SDXL and all other future models since each will require different logic for initial image generation under the current implementation.

StellarBeing25 avatar Jul 03 '24 16:07 StellarBeing25

I don't understand why they don't simply expose the resolution multiplier in the Hires fix UI instead of using such complex logic to make it even more confusing to the user. Using controlnet becomes confusing since it is recommended to use controlnet images of the same resolution as the initial image generated, which is hidden from the user. The image generated from the same seed also becomes completely different when Hires fix is enabled. It will become even more difficult for the developers in the future to implement Hires fix for SDXL and all other future models since each will require different logic for initial image generation under the current implementation.

The UI provides a simplistic interface that's suitable for many people to make images. I think the more complex use cases really call for the use of custom workflows. To that end, Hi-Res Fix is a convenience setting that helps to generate coherent images. It's not meant to work in conjunction with every possible combination of settings and options; using regions, for example, will disable Hi-Res Fix, otherwise all regions would have to be scaled down accordingly and the generated workflow would balloon in complexity.

You can always put in your own values for initial generation resolution, turn Hi-Res Fix off, generate, and then manually set up a second generation at the target size using that initial image that you just generated. If I didn't use workflows, that's what I'd do.

JPPhoto avatar Jul 03 '24 16:07 JPPhoto

I think it would be a good idea to have a "low res pass" resolution settings in an advanced foldout that normally lock to the final aspect ratio but can be unlocked to manually set them. Similar to how locking to an aspect ratio already auto-updates one dimension when you adjust the other.

An additional benefit to having manual control of the low res is avoiding non-64 resolutions. We have identified that, even though diffusers supports any multiple of 8, generating or running img2img on non-64 multiples causes a fringing artifact on the bottom and right edges. With the low res automatically selected, there is no way for the user to directly avoid that.

dunkeroni avatar Jul 03 '24 16:07 dunkeroni

I think it would be a good idea to have a "low res pass" resolution settings in an advanced foldout that normally lock to the final aspect ratio but can be unlocked to manually set them. Similar to how locking to an aspect ratio already auto-updates one dimension when you adjust the other.

Isn't this more complicated than the linear UI needs to be? I'd bet that for most people its defaults work just fine. Would be interesting to poll or otherwise get that information.

JPPhoto avatar Jul 03 '24 17:07 JPPhoto

Isn't this more complicated than the linear UI needs to be? I'd bet that for most people its defaults work just fine. Would be interesting to poll or otherwise get that information.

I am of the opinion that the linear UI only needs to remain simple outside of the Advanced foldouts, and then it should have the capability of being as complicated as is helpful. Highres fix is a very common operation with a limitation that a fairly simple additional setting would solve. Other UIs also started with automatic sizing and eventually moved on to allow manual control. We don't have to require manual control, or even show it in the linear UI by default, but it would be nice to allow it.

dunkeroni avatar Jul 03 '24 17:07 dunkeroni

HRF was added by a contributor, it can certainly be extended to add manual resolution flexibility as well. Aside from the general potential for footgunning, I think it's a good first issue level add

hipsterusername avatar Jul 03 '24 19:07 hipsterusername

The UI provides a simplistic interface that's suitable for many people to make images. I think the more complex use cases really call for the use of custom workflows. To that end, Hi-Res Fix is a convenience setting that helps to generate coherent images. It's not meant to work in conjunction with every possible combination of settings and options; using regions, for example, will disable Hi-Res Fix, otherwise all regions would have to be scaled down accordingly and the generated workflow would balloon in complexity.

You can always put in your own values for initial generation resolution, turn Hi-Res Fix off, generate, and then manually set up a second generation at the target size using that initial image that you just generated. If I didn't use workflows, that's what I'd do.

As a user, a few things you say here don't really fall in line with the current state of ai gen, High res fix is standard in every workflow except flux and now sd3 (which coincidently still massively benefit from it) ...hiding it and any of its options behind custom workflows is working backwards and is absolutely making it harder for end-users to even consider this app as the go to ai gen app. The initial idea to hide it with the redo of the main ui quite frankly was a massive ball drop and trying to justify continuing to hide it seems silly and shortsighted. high res fix greatly improves the quality output of Loras and is not just to avoid doubling or improve coherency. which is my biggest gripe with losing its controls on the main ui. My hope is to see it brought back the way it was before.

zethfoxster avatar Dec 15 '24 16:12 zethfoxster