IMS-Toucan icon indicating copy to clipboard operation
IMS-Toucan copied to clipboard

Why is bigvgan better than Avovodo ?

Open Ca-ressemble-a-du-fake opened this issue 1 year ago • 2 comments

Hi,

I finetune Toucan Meta model on 1k on a reduced dataset to understand the difference between Avocodo and BigVGan.

Here are the spectrograms :

image

Apart from the 12kHz area which is a bit larger for the BigVGan version I can barely differences (I may not look at the right place). Where are the improvements ?

By the way why is there a dark strip around 12kHz ?

Thanks in advance for your explanations!

Ca-ressemble-a-du-fake avatar Apr 25 '23 05:04 Ca-ressemble-a-du-fake

You cannot see the difference in the spectrogram, because a spectrogram does not contain the phase shift information, which is what the vocoder tries to reconstruct. Since the input to the vocoder is already spectrogram, the spectrogram of the output will simply be again the same as the input ideally.

The improvement lies in the generator of BigVGAN, which has mechanisms built in to avoid aliasing during the upsampling process.

The area at 12kHz is due to the signal being 24kHz. So anything above the Nyquist frequency (half the sampling rate, i.e. 12kHz in this case) in a spectrogram is due to a problem called imaging and something that we want to avoid.

Flux9665 avatar May 08 '23 20:05 Flux9665

Thanks for your explanations. I tried to simply filter anything above 12 kHz but it did not sound better.

Ca-ressemble-a-du-fake avatar May 25 '23 03:05 Ca-ressemble-a-du-fake