sd-forge-layerdiffuse icon indicating copy to clipboard operation
sd-forge-layerdiffuse copied to clipboard

Questions about the architecture of latent transparency decoder

Open Xudangliatiger opened this issue 1 year ago • 0 comments

Love your work, thanks for sharing the code.

I got some questions about the decoder part, could you kindly provide me with some hints or guidance on this aspect?

Q1: In Appendix B page 22, “Then the decoder goes through 64 × 64 × 512 →128 × 128 × 512 → 256 × 256 × 256 → 512 × 512 × 128 → 512 × 512 × 3”. should the last output be 512 × 512 × 4?

Q2: The input of decoder is $(x_a, \hat{I})$, how about just input a $x_a$, since it has all the decoded information. If it did not work, then why.

Q3: Is the U-net decoder a must? Is it because that it is too hard to reconstruct $\hat{I}_c$, where the backgrounds information in the premultiplied $I$ is discarded but also required to reconstruct.

Xudangliatiger avatar Mar 04 '24 09:03 Xudangliatiger