practical-pytorch icon indicating copy to clipboard operation
practical-pytorch copied to clipboard

question: seq2seq with attention tutorial, optimizing encoder and decoder separately?

Open pucktada opened this issue 7 years ago • 1 comments

regarding this tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

I just have a question (which probably sound very stupid). I am just wondering is it necessary to optimize parameters of the decoder and the encoder separately here?

encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate) decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate) ... loss.backward() encoder_optimizer.step() decoder_optimizer.step()

so the decoder.parameters() doesn't include encoder.parameters()?? i cant just do "decoder_optimizer.step()"? the loss is backprop all the way, but the parameters aren't?

thanks

pucktada avatar Sep 15 '18 14:09 pucktada

Though this is outdated already, to the best of my understanding, in PyTorch, the parameter becomes registered to the Module when it is assigned to the 'self' in the constructor. So the encoder and decoder will hold references only to the parameters of the submodules that were registered to them. In other words, the encoder.parameters() return only the weights and biases contained within the encoder object, similarly with the decoder. So with such a design, the backward call would calculate all the gradients, but the

decoder_optimizer.step() 

would only update the weights of the decoder object's layers.

I have, however, another question on similar topic. My approach for assigning parameters to the PyTorch optimizer object with a multi-model architecture is as follows:

optim = torch.optim.Adam([
                 {'params': text_encoder.parameters()},
                 {'params': speech_encoder.parameters()},
                 {'params': decoder.parameters()}
             ]

Is there any particular reason for assigning separate optimizer objects in such a scenario, apart from the fact that it enables us to configure optimizers differently for each module? Is there some mistake with my approach?

kwasnydam avatar Apr 19 '19 07:04 kwasnydam