mage
mage copied to clipboard
Why encoder-decoder architecture?
Hi @LTH14! Congrats on your nice work being accepted by CVPR. Just as the title, I'm confused why you choose to use an encoder-decoder architecture like MAE? Have you ever tried using a encoder only arch like BEiT?
We haven't tried an encoder-only structure like BEiT. The reason why we chose the MAE-style enc-dec structure is simply that they were the sota method at that time. Also, an encoder-decoder structure enables us to decouple the representation learning from the generation.