Sang-Hoon Lee

Results 9 comments of Sang-Hoon Lee

Thank you for your quick reply! When I compared the models trained with the complex MS-STFT and real MS-STFT discriminator, they have similar performance on Mel reconstruction error and PESQ....

Thank you for your interest. Actually, due to the computational resource constraints, I stopped training BigVGAN vocoder 😢. (I trained it only for 300k steps) When I evaluated, BigVGAN has...

There are so many ways... First, check a preprocessing method for your Mel-spectrogram Second, change the initial frequency value for resampling https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L104 Calculate this according to your sampling, hop size,...

Thank you... I have mis-implemented this parameter... I'll fix it right now. Thanks again

```sh self.alpha1 = nn.ParameterList([nn.Parameter(torch.ones(1, channels, 1)) for i in range(len(self.convs1))]) ``` I changed alphas to ParameterList https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L51 https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L52 https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L100 https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L102 https://github.com/sh-lee-prml/BigVGAN/blob/main/models_bigvgan.py#L108 ![image](https://user-images.githubusercontent.com/56749640/179433560-386eca1b-6b6e-4b5c-8fdc-ee4d5d3f9bc7.png) Now, alpha is trainable 😢 Thank you again👍

Hi @HaiFengZeng The official code of snake1d initializes it greater than 0 by abs() ```sh a = torch.zeros_like(x[0]).normal_(mean=0,std=50).abs() ``` But, I think it does not need to be greater than...

I refer to Appendix A (page 13) of BigVGAN paper. **The upsampled feature is followed by M number of AMP residual blocks, where each AMP block uses different kernel sizes...

Thank you for your concern about this issue. I agree your idea about using a low-pass filter only once. However, in this case, I was confused about the activation function...

After some comparison, I found that both models (using resampling once or twice) have similar performance. But when using twice, training/inference speed is much slower so I changed it as...