taming-transformers
taming-transformers copied to clipboard
small question about the vq-vae paper
hello! thank you for your great work!
I have a question about loss function in paper. L = log p(x|z(x)) + ||sg[z(x)] − e|| + β||z(x) − sg[e]||
the author mentioned that a third term exists because e can grow arbitrarily if it doesn't train as fast as the encoder parameters. but I see that term only helps the encoder to be trained faster.
will it help the e to be trained faster too? but I assume that sg[e] is meaning that the e won't be trained by the term. I hope this isn't a silly question ;) thx in advance.