vq-vae-2-pytorch
vq-vae-2-pytorch copied to clipboard
Cannot reconstruct when use mel spectrum as data
trafficstars
I leveraged the code and setting, with the only change that I employed conv1d to process mel spectrums, which can be considered as 1-d data with 80 channels. However, I found the reconstruction is quite poor, converging to a large loss. Is there any guess for the reason or suggestion for debugging? Thanks a lot!
Sorry for late reply. I haven't tried this model on audio domain, but I suspect that data normalization and preprocessings are crucial for log melspectrogram as this model doesn't have particular normalization layers.