My custom implemetation in Automatic1111's WebUI
Dear authors,
I have implemented your algorithm to Automatic1111's WebUI with the following optimization:
- Cropping views in a more symmetric way to get a better result.
- Pre-calculate weights to save time (as weights won't change once the views are determined.
- Batched latent view processing for acceleration.
Some WebUI related stuffs:
- Compatibility with all samplers.
- Compatibility with ControlNet.
Here is the link:
- https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111
Great thanks to your fantastic work especially in img2img and panorama generation! We are working on text prompt now.
But the uncontrolled large image generation is not ideal at all, as repeated patterns always appears and the image is mostly unusable.
Would you please give us some insights, if we can generate large images without a user-specified prompt mask?
For example, I have an idea (without proof): we may generate a small reference image first, obtain the prompt attention map, scale it to a larger resolution, and finally we automatically locate the prompt to its correct views during multi-diffusion.
Thank you very much!
Thank you for implementing MultiDiffusion with the WebUI -- looks great!
Regarding larger images -- in the simplest setting of having the same prompt for all views, then almost by definition it may be unsuitable for certain prompts/resolutions (e.g., when generating a single object that should not appear in each view). I think that a coarse-to-fine generation approach can help with this.