Training Instability with torch_pesq: Loss Gradually Becomes NaN

Open Thanatoz-1 opened this issue 6 months ago • 1 comments

When training my model with torch_pesq integrated, the loss gradually drifts to NaN, causing gradients to break and training to halt.

I am using this along with MultiResolutionSpectralLoss as follows:

def __init__(**):
    ...
    self.pesq_loss = PesqLoss(sample_rate=self.target_sr, factor=10)

def training_step(**):
    ...
    loss_pesq = self.pesq_loss(wav.squeeze(1), wav_hat.squeeze(1)).mean()
    recontstruction_loss = loss_mrl + loss_pesq
    ...

I’ve experimented with various gradient clipping values, but the issue persists. I'm unsure why the loss continues to drift toward NaN. Any insights or suggestions would be greatly appreciated.

May 15 '25 16:05 Thanatoz-1

Not sure what may be the cause, but can you try combining with a simple MSE loss first and see whether it still diverges?

May 26 '25 08:05 bytesnake