EscherNet icon indicating copy to clipboard operation
EscherNet copied to clipboard

the genenrated target views become blurry during training

Open fangchuan opened this issue 5 months ago • 5 comments

          Hi @fradif96 

We don't have new modules for cross/self-attention. It's the same attention layers but just reshape the latent features from ((b t) l d) -> (b (t l) d) here. The trainable layers are UNet, ConvNext image encoder here (we didn't use CLIP as it only captures high-level semantics), the VAE is frozen, and we fintuned from SD1.5 inited here.

Hope it helps and please let me know if something is still unclear.

Originally posted by @kxhit in https://github.com/kxhit/EscherNet/issues/1#issuecomment-1935166434

fangchuan avatar Sep 09 '24 15:09 fangchuan