diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Pipelines] Support for T2I-Adapter

Open wfng92 opened this issue 2 years ago • 3 comments
trafficstars

Model/Pipeline/Scheduler description

From the official repository, T2I-Adapter by @TencentARC is

... a simple and small (~70M parameters, ~300M storage space) network that can provide extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models. T2I-Adapter aligns internal knowledge in T2I models with external control signals. We can train various adapters according to different conditions, and achieve rich control and editing effects.

Would be great to have this plug and play adapters in diffusers module.

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Original code: https://github.com/TencentARC/T2I-Adapter Pre-trained models: https://huggingface.co/TencentARC/T2I-Adapter

wfng92 avatar Feb 17 '23 05:02 wfng92

I have made a quick attempt to implement the T2I-Adapter in diffusers, which can be found over here. Based on the results obtained from the t2iadapter_seg_sd14v1.pth adapter, it appears to be working correctly.: Screenshot from 2023-02-19 15-09-51

The adapter module itself is quite simple, so I think the main consideration of integrating the adapter into diffusers will be:

  • Should the design of the adapter module allow it to inject the adapter hidden state to any layer inside UNet (in the official implementation the adapter state is always added after the last ResnetBlock from each downsample block)
  • Should the adapter be integrated into UNet, since the number of feature maps and size of feature maps the adapter output all depend on the UNet model it is working with

HimariO avatar Feb 19 '23 07:02 HimariO

Thanks so much for your hard work, @HimariO! Your questions are quite valid. Do you think the design philosophy of how integrated LoRA into diffusers would be of help (PR)? I mentioned it because LoRA also falls under the adapter series of neural nets.

Let me see what other members think.

Cc: @patil-suraj @patrickvonplaten @williamberman

sayakpaul avatar Feb 20 '23 08:02 sayakpaul

@sayakpaul, thank you for directing me to the LoRA PR. It has been very helpful in giving me a general idea of the design philosophy of similar features. After reviewing the PR for LoRA and the draft PR for ControlNet, I believe we can create a more versatile API that can support T2I-Adapter, ControlNet, and other similar modules that have independent input and will inject the output into diffusion model. PoC can be found here.

HimariO avatar Feb 21 '23 17:02 HimariO

Agree. As T2I and ControlNet (they share similar designs) both require some changes of UNet, more similar pipelines in the future may lead to crash. It is necessary to consider how to efficiently merge them into one framework.

haofanwang avatar Feb 21 '23 18:02 haofanwang

@HimariO I left a comment directly on your commit. Thanks so much!

We usually consider the code-level impact we might have before accommodating a large change in the API. So, I request @patil-suraj @williamberman @patrickvonplaten @yiyixuxu to chime in here too.

Note that this is a lighter-than-usual week for us, so there might be some delay in our response.

sayakpaul avatar Feb 22 '23 02:02 sayakpaul

My PR for ControlNet (#2407) has been open for a while now. I am also in favor of the Sideload-related changes. The T2I-Adapter and ControlNet share a similar concept in that they both interfere with UNet. I think that the Sideload concept could be a common foundation and have good potential for future extensions. (I have left a comment on the ControlNet thread.)

takuma104 avatar Feb 22 '23 16:02 takuma104

My understanding from a preliminary read of the t2i adapter paper is that the outputs from the adapter model are just added with the intermediate features of the encoder of the unet. This shouldn't require any hacking of the existing block definitions and could be done just by passing the outputs of the adapter to the forward method of the unet.

williamberman avatar Feb 23 '23 21:02 williamberman

@williamberman your understanding is correct, and what you describe is exactly what I do with my first prototype, The main motivation for trying out new concepts like sideloading is to avoid modifying every sub-module the adapter/controlnet-like model interacts with, especially when those modules are buried deep in the module hierarchy or there are different adapter variation targeting different modules.

HimariO avatar Feb 25 '23 16:02 HimariO

Thanks for thinking this through @HimariO! Let us know whenever you're read with a PR and / or if you need any help.

sayakpaul avatar Feb 28 '23 11:02 sayakpaul

related: https://github.com/cloneofsimo/t2i-adapter-diffusers

AK391 avatar Feb 28 '23 23:02 AK391

Hi @sayakpaul, just a quick note to let you know that I'm planning on creating the PR this week, and I'll let you know if there are any design-related issues that require further discussion. Thanks!

HimariO avatar Mar 01 '23 04:03 HimariO

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 25 '23 15:03 github-actions[bot]