SeongYeonPark

Results 4 issues of SeongYeonPark

When running the below code, ``` from scipy.io.wavfile import read import sox import numpy as np path = 'input.wav' sr, wav = read(path) tfm = sox.Transformer() tfm.set_globals(verbosity=0) stretch_ratio = np.random.normal(1.05,0.125)...

As far as I understood, the perplexity used in this repo's VQ-VAE is kind of "meaningfully used codebook token numbers". When only one codebook token is used, perplexity is 1....

In your paper, you report WER and CER results of about 4.23% and 1.46%. Also, you mentioned that you used https://huggingface.co/facebook/hubert-large-ls960-ft as the ASR model. But, when using the same...

In the proposed SR-augmentation, when the mel is squeezed, you pad it with the highest frequency bin value and add Gaussian noise. Can you share the scale of the added...