Shawn Zhang issues

Results 7 issues of


                                            Shawn Zhang

Improving Performance on Shorter Audio Clips

Using your GPVAD/VADC, I wish to process smaller chunks (i.e. ~200ms chunks) of audio files. However, when the duration is this low, the performance of the VAD is poor. What...

Thoughts on streaming the forward pass?

This is some really good work! I have a question: Have you tried using your algorithm to process an audio _stream_? How would performance be affected? And how feasible would...

Generated Samples Have an Audible Pop at End

I'm curious if anyone else has experienced this problem. When training the MB-MelGAN, I use the `batch_max_steps: 8192` as the length of the data. After training, although the audio quality...

question

Output audio duration does not exactly match input audio.

Running through your pre-trained models, I found that generated audio does not exactly match the input in duration length. For example, ``` wav, sr = load_wav(os.path.join(a.input_wavs_dir, filname)) wav = wav...

Why perform Audio2Mel's method on extracting mel spectrogram?

Audio2Mel does the following to extract the mel spectrogram: ``` data, sampling_rate = load(full_path, sr=self.sampling_rate) data = 0.95 * normalize(data) if self.augment: amplitude = np.random.uniform(low=0.3, high=1.0) data = data *...

How to Perform Spec Augmentation on Batch Size > 1?

For example, I have a tensor with a batch size of 2, e.g. of shape `torch.Size([2, 256, 512])`. Now, when I run the command through ``` spec_augment_pytorch.spec_augment(mel_spectrogram=mel_spectrogram) ``` I have...

Adding CDPAM to MelGAN Training

Hello, Thank you for sharing this fantastic work! I was curious if you could please share the code you used to integrate the CDPAM as an additional loss for the...