pytorch-vqvae icon indicating copy to clipboard operation
pytorch-vqvae copied to clipboard

Codebook embedding does not update

Open zhxgj opened this issue 5 years ago • 4 comments

I found ctx.needs_input_grad[1] is False during training VQ-VAE. Is this correct, and does it mean the embedding of the codebook does not update during training?

https://github.com/ritheshkumar95/pytorch-vqvae/blob/8d123c0d043bebc8734d37785dd13dd20e7e5e0e/functions.py#L53

zhxgj avatar Apr 24 '20 22:04 zhxgj

i agree with you upfloor. it is so weird.

zhangbo2008 avatar Sep 10 '21 23:09 zhangbo2008

I found ctx.needs_input_grad[1] is False during training VQ-VAE. Is this correct, and does it mean the embedding of the codebook does not update during training?

https://github.com/ritheshkumar95/pytorch-vqvae/blob/8d123c0d043bebc8734d37785dd13dd20e7e5e0e/functions.py#L53

This part of code has not been executed! But I printed the "model.codebook.embedding.weight.data" and found that this part will be updated!

chenaoxuan avatar Dec 26 '22 09:12 chenaoxuan

Actually, ctx.needs_input_grad[0] and ctx.needs_input_grad[1] are set to true and false alternatively. For the 1st step, ctx.needs_input_grad[0] is true and ctx.needs_input_grad[1] is false. For the 2nd step, ctx.needs_input_grad[0] becomes false and ctx.needs_input_grad[1] becomes true. For the 3rd step, ctx.needs_input_grad[0] is true and ctx.needs_input_grad[1] is false. So on and so forth...

This setting is reasonable because there are two "agents", namely codebook and autoencoder, updating w.r.t. to different parts of the loss function.

Roller44 avatar Jul 12 '23 07:07 Roller44

Actually, ctx.needs_input_grad[0] and ctx.needs_input_grad[1] are set to true and false alternatively. For the 1st step, ctx.needs_input_grad[0] is true and ctx.needs_input_grad[1] is false. For the 2nd step, ctx.needs_input_grad[0] becomes false and ctx.needs_input_grad[1] becomes true. For the 3rd step, ctx.needs_input_grad[0] is true and ctx.needs_input_grad[1] is false. So on and so forth...

This setting is reasonable because there are two "agents", namely codebook and autoencoder, updating w.r.t. to different parts of the loss function.

I debug the code and find that ctx.needs_input_grad[1] is always false rather than being set to true and false alternatively. A basic fact is that if a variable $A$ doesn't require gradient, it doesn't mean that it will not be updated during optimization. The attirbute requires_grad describes whether its gradient should be calculated. In other word, whether other variables calculated by $A$ should be updated rather than updating $A$ itself!

Therefore, though ctx.needs_input_grad[1].requires_grad is always False, the codebook can still be updated.

RipeMangoBox avatar May 10 '24 11:05 RipeMangoBox