rcg
rcg copied to clipboard
What will happen if CLIP image representation is used to replace SSL representation?
Hi, author! Thanks for your sharing! You do an impressive work! I have two question. The first is what will happen if CLIP image representation is used to replace SSL representation in the first two stages. The second is why not also adopt a diffusion model in the third stage? Compared with the diffusion models, what are the advantages of using mage?
Looking forward to your reply!