StyleTTS2 weird pulse at the end of the model

weird pulse at the end of the model

Open matmult opened this issue 11 months ago • 5 comments

As titled, there are a lot of comments in the code that says "weird pulse at the end of the model". I would just like to know if this has been fixed.

Mar 14 '24 05:03 matmult

I use whisper-large-v3 for adversarial training instead of wav-lm. The pulse disappeared after finetune. Hope it helps.

May 16 '24 10:05 Ziyueyork

@Ziyueyork I have tried train_finetune.py with slm: model: 'openai/whisper-large-v3', but I get Whisper expects the mel input features to be of length 3000, but found 20000. Make sure to pad the input mel features to 3000. Do you have a suggestion as to how to get to the right shape? I use max_len: 100

May 28 '24 18:05 mocialov

@Ziyueyork I have tried train_finetune.py with slm: model: 'openai/whisper-large-v3', but I get Whisper expects the mel input features to be of length 3000, but found 20000. Make sure to pad the input mel features to 3000. Do you have a suggestion as to how to get to the right shape? I use max_len: 100

I guess you pass the wave vector to Whisper model as the original code passes it to WavLM. WavLM model accepts that but Whisper needs log mel spectrogram as the input. So you need to add feature extract code in WavLMLoss class in losses.py. Here is a reference code for feature extraction. https://github.com/huggingface/transformers/blob/v4.40.2/src/transformers/models/whisper/feature_extraction_whisper.py

May 30 '24 01:05 Ziyueyork

@Ziyueyork Can you please share your modified WavLMLoss code?

Jul 12 '24 08:07 mc-marcocheng

@matmult it's not been solved in code yet but you may refer to this comment for a solution if you're experiencing this issue

Sep 01 '24 18:09 martinambrus

StyleTTS2 StyleTTS2 copied to clipboard

weird pulse at the end of the model

StyleTTS2
StyleTTS2 copied to clipboard