SoundStream gradient computation has been modified by an inplace operation

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1, 1, 7]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck

Jan 16 '22 03:01 MasterEndless

The problems seems lie in the function: class CausalConv1d(nn.Conv1d): def init(self, *args, **kwargs): super().init(*args, **kwargs) self.causal_padding = self.dilation[0] * (self.kernel_size[0] - 1)

def forward(self, x):
    return self._conv_forward(F.pad(x, [self.causal_padding, 0]), self.weight, self.bias)

Jan 16 '22 04:01 MasterEndless

First train the discriminator then train the generator will solve this problem, but I am sure whether this will affect the training accuracy. I have read many GAN implementations, and there is a generator first version as well as a discriminator first version, so I am not sure. But I will let the network run and see what happens.

Jan 17 '22 22:01 MasterEndless

Can you get a good result?

Jan 20 '22 06:01 liuyoude

Can you get a good result?

I haven't started training yet, but I will update my results here as soon as I did it.

Jan 20 '22 17:01 MasterEndless

Hi, Thanks for your issue! I encountered this issue with inplace operation on other parts of the code. Some other might need a fix for that. Regarding the training orders of the components, I don't think there is any sort of consensus.

Jan 20 '22 17:01 wesbz

Hi, Thanks for your issue! I encountered this issue with inplace operation on other parts of the code. Some other might need a fix for that. Regarding the training orders of the components, I don't think there is any sort of consensus.

Thanks! I will pull my solution for the issues if it helps.

Jan 20 '22 17:01 MasterEndless

@wesbz @liuyoude One problem I found is that the loss of generator is significantly larger than the discriminator, for example: 6199809.67 vs 2.35. This might cause the non-convergence of the model?

Jan 27 '22 17:01 MasterEndless

Hi, To the best of my knowledge, since you have 2 losses and 2 optimizers, the differences in gradients' amplitudes shouldn't tell you much about the model's convergence. But one should always challenge his/her beliefs. What would you suggest? clipping the gradients?

Jan 28 '22 18:01 wesbz

@wesbz Yeah, I agree with what you said, sorry I rush to the conclusion. I am training the network now, I can see the loss of discriminator and generator are decreasing, but it's still so slow. I will update here once I found something, thanks again!

Jan 28 '22 18:01 MasterEndless

@wesbz @liuyoude One problem I found is that the loss of generator is significantly larger than the discriminator, for example: 6199809.67 vs 2.35. This might cause the non-convergence of the model?

yeah, the spectral reconstruction loss of the paper refers to GED loss(https://github.com/google-research/google-research/tree/68c738421186ce85339bfee16bf3ca2ea3ec16e4/ged_tts), I try to run it and the result is small. But the details of the code is different from in paper.

Feb 07 '22 09:02 liuyoude

@wesbz @liuyoude Here are my loss curves: Screenshot from 2022-02-10 21-33-27 Any suggestions on why it didn't converge? The superparameters are the same with paper implementations.

Feb 11 '22 03:02 MasterEndless

I think it may be caused by the weight set of generator loss

Feb 15 '22 12:02 liuyoude

@MasterEndless any advances in convergence? I have the exact issue. I am training with LibriSpeech data, 3s normalized clips.

Mar 07 '22 13:03 d-caviedes

I found the implementation of generator loss is different from what the original paper says, and after modifying, I push the model for training, but still the generator is not converged yet...

Mar 09 '22 02:03 MasterEndless

@MasterEndless did you find a solution for convergence? Meta recently released their neural codec code but I got the same problem from their model also. The problem is with the convergence of the loss. I have tried with MSE loss of waveform, spectral loss as specified, and l1 and l2 loss of the waveform but all of them did result from a convergence of the loss.

@wesbz are you planning to work on the convergence of the model or did you find anything that may help us?

@liuyoude did you also find solution to spectral reconstruction loss???

Nov 17 '22 04:11 compressor1212