EscherNet
EscherNet copied to clipboard
the genenrated target views become blurry during training
Hi @fradif96
We don't have new modules for cross/self-attention. It's the same attention layers but just reshape the latent features from ((b t) l d) -> (b (t l) d) here. The trainable layers are UNet, ConvNext image encoder here (we didn't use CLIP as it only captures high-level semantics), the VAE is frozen, and we fintuned from SD1.5 inited here.
Hope it helps and please let me know if something is still unclear.
Originally posted by @kxhit in https://github.com/kxhit/EscherNet/issues/1#issuecomment-1935166434