RAVE
RAVE copied to clipboard
`valid_signal_crop` in `validation_step`?
I noticed when training causal models with RAVE v2 that the validation audio sounds pretty bad. If I'm understanding correctly, it's because V2 crops to the valid (as in convolution) portion of the signal, so the part of the reconstruction which is affected by zero padding (~2/3 of it with v2 defaults) is not trained at all. But validation_step
doesn't do the same cropping, so the validation curve looks very noisy and the audio sounds bad in tensorboard.
Would it make sense to include the same cropping in validation_step
?
The cropping is only useful for the training, dropping signal with zero gradients. Cropping it in validation_step would not have that much sense, and would mess with the output dimensionality. Furthermore, audio is not related to curves ; causal configurations are unfortunately limiting the capacity of RAVE modelling, so maybe the sound quality is due to the training and configuration. don't know if @caillonantoine would have additional comments?
Agree that this change doesn't affect training, only logging.
However I'm quite certain it works as described, I've been using it on my fork. Since the beginning part of the reconstruction gets cropped from the loss during training, I believe the model is incentivized to collapse the corresponding latents (i.e., those influenced by zero padding) to the prior. So, the beginning of the reconstruction ends up unrelated to the input. this leads to high reconstruction error when that part isn't cropped at validation time, which makes the validation curve in tensorboard noisy and unreadable. also, I'm quite certain the audio logged is affected. it's the same audio computed in validation_step
which gets logged in valdiation_epoch_end
, (https://github.com/acids-ircam/RAVE/blob/b67a1872b5dc6a9875970d1339bc0803a3832c6e/rave/model.py#L457) no?
this change only shortens the logged audio, by slicing off the 'random' prior-collapsed part. but I find it easier to hear how faithful the reconstructions are this way.