DragonDiffusion
DragonDiffusion copied to clipboard
No fine-tuning or extra module needed, why losses and learning rate?
Hi, interesting work. As mentioned in the paper:
All content editing and preservation signals in our proposed method come from the image itself. It allows for a direct translation of T2I generation ability in diffusion models to image editing tasks without the need for any model fine-tuning or training.
I'm confused by few losses and even learning rate defined in Eq. 5 and Eq. 6, what are they used for? Thanks in advance.