practical-pytorch
practical-pytorch copied to clipboard
question: seq2seq with attention tutorial, optimizing encoder and decoder separately?
regarding this tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
I just have a question (which probably sound very stupid). I am just wondering is it necessary to optimize parameters of the decoder and the encoder separately here?
encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate) decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate) ... loss.backward() encoder_optimizer.step() decoder_optimizer.step()
so the decoder.parameters() doesn't include encoder.parameters()?? i cant just do "decoder_optimizer.step()"? the loss is backprop all the way, but the parameters aren't?
thanks
Though this is outdated already, to the best of my understanding, in PyTorch, the parameter becomes registered to the Module when it is assigned to the 'self' in the constructor. So the encoder and decoder will hold references only to the parameters of the submodules that were registered to them. In other words, the encoder.parameters() return only the weights and biases contained within the encoder object, similarly with the decoder. So with such a design, the backward call would calculate all the gradients, but the
decoder_optimizer.step()
would only update the weights of the decoder object's layers.
I have, however, another question on similar topic. My approach for assigning parameters to the PyTorch optimizer object with a multi-model architecture is as follows:
optim = torch.optim.Adam([
{'params': text_encoder.parameters()},
{'params': speech_encoder.parameters()},
{'params': decoder.parameters()}
]
Is there any particular reason for assigning separate optimizer objects in such a scenario, apart from the fact that it enables us to configure optimizers differently for each module? Is there some mistake with my approach?