Shawn Zhang
Shawn Zhang
Using your GPVAD/VADC, I wish to process smaller chunks (i.e. ~200ms chunks) of audio files. However, when the duration is this low, the performance of the VAD is poor. What...
This is some really good work! I have a question: Have you tried using your algorithm to process an audio _stream_? How would performance be affected? And how feasible would...
I'm curious if anyone else has experienced this problem. When training the MB-MelGAN, I use the `batch_max_steps: 8192` as the length of the data. After training, although the audio quality...
Running through your pre-trained models, I found that generated audio does not exactly match the input in duration length. For example, ``` wav, sr = load_wav(os.path.join(a.input_wavs_dir, filname)) wav = wav...
Audio2Mel does the following to extract the mel spectrogram: ``` data, sampling_rate = load(full_path, sr=self.sampling_rate) data = 0.95 * normalize(data) if self.augment: amplitude = np.random.uniform(low=0.3, high=1.0) data = data *...
For example, I have a tensor with a batch size of 2, e.g. of shape `torch.Size([2, 256, 512])`. Now, when I run the command through ``` spec_augment_pytorch.spec_augment(mel_spectrogram=mel_spectrogram) ``` I have...
Hello, Thank you for sharing this fantastic work! I was curious if you could please share the code you used to integrate the CDPAM as an additional loss for the...