vector-quantize-pytorch
vector-quantize-pytorch copied to clipboard
EMA update on CosineCodebook
The original VIT-VQGAN paper does not seem to use EMA update for codebook learning since their codebook is unit-normalized vectors.
Particularly, to my understanding, EMA update does not quite make sense when the encoder outputs and codebook vectors are unit-normalized ones.
What's your take on this? Should we NOT use EMA update with CosineCodebook?
Would you like to explain why ema does not work for the unit-normalized codebook?
I found when using EMA for cosine code book, the l2-norm of the input to the vq module would grow gradually, from 22 -> 20000, leading to growing training loss. Has anyone met this problem?
I found when using EMA for cosine code book, the l2-norm of the input to the vq module would grow gradually, from 22 -> 20000, leading to growing training loss. Has anyone met this problem?
In case anyone else has this problem, I add a layernorm
layer after the vq_in
projection, and the growing norm problem is largely solved.
@Saltychtao I also encounter a similar issue. Does vq_in refer to VectorQuantize.project_in?
@Saltychtao I also encounter a similar issue. Does vq_in refer to VectorQuantize.project_in?
Yes.
I found when using EMA for cosine code book, the l2-norm of the input to the vq module would grow gradually, from 22 -> 20000, leading to growing training loss. Has anyone met this problem?
In case anyone else has this problem, I add a
layernorm
layer after thevq_in
projection, and the growing norm problem is largely solved.
@Saltychtao Hi, just want to make sure that the current vesion of implementation here seems to put one normalization (l2norm) after the project_in. I also encounter the training loss explosion issue somehow at current version