melgan-neurips icon indicating copy to clipboard operation
melgan-neurips copied to clipboard

How to combine melGAN with feature predictor like FastSpeech or tacotron2?

Open nikawool opened this issue 4 years ago • 2 comments

FastSpeech: https://github.com/xcmyz/FastSpeech How can I combine melGAN with feature predictor like FastSpeech or tacotron2?

nikawool avatar Feb 25 '20 06:02 nikawool

Have you tried Fastspeech combined with melgan? How is the result?

Liujingxiu23 avatar Apr 20 '20 01:04 Liujingxiu23

I've been playing with Tacotron2's inference notebook.. but so far just noise for me. I copied the mel2wav folder and my checkpoint log directory to the tacotron2 directory I end up adding a section after the RemoveWaveGlow bias section of the notebook.

vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG") recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy() ipd.Audio(recons , rate=22050)

I've also tried;

vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG")

recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy()

meldata = mel_outputs.float() meldata.shape torch.Size([1, 80, 503]) rev_wav = vocoder.inverse(meldata.float())#.squeeze().cpu().numpy() rev_wav.shape torch.Size([1, 128768]) rev_wav.dtype torch.float32 rev_wav2 = rev_wav.cpu().numpy() rev_wav2.shape (1, 128768) ipd.Audio((rev_wav2.reshape((-1))*2**15).astype(np.int16), rate=22050)

Same results.

Teravus avatar Sep 29 '20 03:09 Teravus