StyleTTS
StyleTTS copied to clipboard
MelSpectrogram() and unspecified sampling rate
in the meldataset.py, could see that all wav files are resampled to 24000sps. however, as the MelSpectrogram() transform is called without sample_rate argument defaults to 16000sps.
to_mel = torchaudio.transforms.MelSpectrogram(
n_mels=80, n_fft=2048, win_length=1200, hop_length=300)
mean, std = -4, 4
def preprocess(wave):
wave_tensor = torch.from_numpy(wave).float()
mel_tensor = to_mel(wave_tensor)
mel_tensor = (torch.log(1e-5 + mel_tensor.unsqueeze(0)) - mean) / std
return mel_tensor
questions :
- believe 2400sps (vs 16000sps needed) was an oversight ?
- also, how was the mean/std of -4, 4 arrived.
yl4579/StarGANv2-VC#10 and #57, should be helpful.