IMS-Toucan icon indicating copy to clipboard operation
IMS-Toucan copied to clipboard

Where can I intercept the Mel Spectrogram to save it as .npy ?

Open Ca-ressemble-a-du-fake opened this issue 1 year ago • 2 comments

Hi,

Out of curiosity, I want to test BigVGan. On their page they say that it accepts .npy as input. I browsed the code but could not find where the Mel Spectrogram is generated.

Could you please show me the line of code that I can save to use BigVGan (manually) ?

Thanks in advance for your help

Ca-ressemble-a-du-fake avatar Mar 23 '23 06:03 Ca-ressemble-a-du-fake

The next release will include BigVGAN, it's already in one of the experimental branches. It works extremely well, especially when it's paired with the discriminators that Avocodo adds. But it also very slow unfortunately.

Here are the spectrograms: https://github.com/DigitalPhonetics/IMS-Toucan/blob/e41e266ccacf282a9854d562f9e3d604f1cf245b/InferenceInterfaces/PortaSpeechInterface.py#L185

I'm not sure their spectrogram settings are the same as ours though, so not sure if their model will work out of the box with outputs from this TTS.

Flux9665 avatar Mar 23 '23 12:03 Flux9665

Thank you, will give this a try!

Ca-ressemble-a-du-fake avatar Mar 24 '23 04:03 Ca-ressemble-a-du-fake