DragonDiffusion
DragonDiffusion copied to clipboard
Question about Cross-branch Self-attention
Hi, thanks for the interesting work! According to the Sec 3.4, W_{Q, K, V} are learnable projection matrices, so does DragonDiffusion need to fine-tune projection matrices for all two branches? Or am I understanding something wrong? If fine-tuning is really needed, is its loss function consistent with that of latent z_t? Thanks!