ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
Model/Pipeline/Scheduler description
ResAdapter support image generation in arbitrary resolution for comunity models. The result is quite interesting Project page: https://res-adapter.github.io/ Github: https://github.com/bytedance/res-adapter
Open source status
- [X] The model implementation is available.
- [X] The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
No response
Hey I would also like to work on this implementation. Have you already tried ResAdapter?
Unfortunately, I also intented to work on this if possible
Hey, thanks for suggesting this! It indeed has very cool results. I'm not sure if there's anything to implement here, no? The resolution loras can be loaded like normal loras and there's no new modelling code involved AFAICT. It's just the position where the loras apply. Their example code works out of the box with diffusers as well. Maybe a training script would be interesting as they do not plan to release it due to company decisions as mentioned in [this] issue on their repo.
@a-r-r-o-w Sure, I think we can wotk on a training script for this model. However, I'm not sure can reproduce the result as it is using LAION-5B
@rootonchair @a-r-r-o-w @PacificG Thanks for your attention.
For inference, we have supported huggingface demo, replicate demo that you can use. We will support ComfyUI.
For training, it actually is easy to reproduce if you want to do it. Here I give some advice:
- For dataset, make sure you can choose images from different resolution and buckets. Meanwhile, no matter what dataset, you can also reproduce the results. Because our resadapter do not capture style information from datasets.
- For model architecture, Insert lora, and open group norm in resnet.
- In training process, when resolution > 512, open group nrom. when resolution <= 512, close group norm. The lora always be trained.
- In training process, you can write a probability function to choose different resolution.
If someone is interested in our work. Then you can try it.
Best,
Thank you for your awesome work @jiaxiangc! I just finished reading the paper and think that the provided details would be enough for me to replicate the results, which I'll hopefully try working on in the next few days. I couldn't find info about why the group norm layers should be frozen for lower resolutions, could you explain? The reasoning mentioned in the paper is for it to adapt to the statistical distribution of feature maps of high res images, but shouldn't the equivalent case be true for low res images too (say 256x256)? Since most, if not all, SD models fail at lower resolutions currently. I will try experimenting to understand more.
I might also bother you with questions and for reviewing the implementation once completed :)
@a-r-r-o-w Taking SD1.5 as an example, we initially experimented with Group norm and LoRA training turned on at a resolution of 128 to 1024. We found that at resolutions <512, inserting only LoRA can still achieve good results. Again, because Group norm is essentially mean and variance, it can't fit the resolution information from 128 to 1024 at the same time like LoRA. Therefore, we only consider turning on Group norm training when the resolution is larger than 512.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.