iSTFTNet-pytorch
iSTFTNet-pytorch copied to clipboard
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Repgan
tks very much!!!!  https://user-images.githubusercontent.com/16432329/177454772-43428023-5ed3-4c9e-b8af-fb68f1a11ff3.mp4
Thanks for the implemention of ISTFT. It has better inference speed than hifigan v1.However, I found that there is a single frequency line which would cause little noise.I use 16KHZ...
Hi, thanks to the implement, the inference speed is impressive. How about the audio quality? And have you tried v2 config? Thanks in advance.
Have you got good audio?
As the issue https://github.com/rishikksh20/iSTFTNet-pytorch/issues/1, the line 164 in `stft.py` was changed to https://github.com/rishikksh20/iSTFTNet-pytorch/blob/e928a6b604033a3857757562af36241f9225adfc/stft.py#L164 But `inverse_transform.device()` will raise the exception mentioned in the title. So it can be changed to `inverse_transform.device`...
https://arxiv.org/abs/2206.00208
 https://github.com/rishikksh20/iSTFTNet-pytorch/blob/ecbf0f635b36432bd3e432790326591bc86cadbc/models.py#L97 https://github.com/rishikksh20/iSTFTNet-pytorch/blob/ecbf0f635b36432bd3e432790326591bc86cadbc/config_v1.json#L16 Why is fs 16?
The phase output of the generator currently can only range from -1 to 1, which is not enough as full phase in radians is expected later in `stft.inverse()` (either 0..2*pi...
Has anyone tried to directly model the complex numbers instead of the phase and magnitude? What would be the problem if we model the real and imaginary parts directly?