sd-forge-layerdiffuse
sd-forge-layerdiffuse copied to clipboard
Questions about the architecture of latent transparency decoder
Love your work, thanks for sharing the code.
I got some questions about the decoder part, could you kindly provide me with some hints or guidance on this aspect?
Q1: In Appendix B page 22, “Then the decoder goes through 64 × 64 × 512 →128 × 128 × 512 → 256 × 256 × 256 → 512 × 512 × 128 → 512 × 512 × 3”. should the last output be 512 × 512 × 4?
Q2: The input of decoder is $(x_a, \hat{I})$, how about just input a $x_a$, since it has all the decoded information. If it did not work, then why.
Q3: Is the U-net decoder a must? Is it because that it is too hard to reconstruct $\hat{I}_c$, where the backgrounds information in the premultiplied $I$ is discarded but also required to reconstruct.