StyleTTS2
StyleTTS2 copied to clipboard
weird pulse at the end of the model
As titled, there are a lot of comments in the code that says "weird pulse at the end of the model". I would just like to know if this has been fixed.
I use whisper-large-v3 for adversarial training instead of wav-lm. The pulse disappeared after finetune. Hope it helps.
@Ziyueyork I have tried train_finetune.py
with slm: model: 'openai/whisper-large-v3', but I get Whisper expects the mel input features to be of length 3000, but found 20000. Make sure to pad the input mel features to 3000.
Do you have a suggestion as to how to get to the right shape? I use max_len: 100
@Ziyueyork I have tried
train_finetune.py
with slm: model: 'openai/whisper-large-v3', but I getWhisper expects the mel input features to be of length 3000, but found 20000. Make sure to pad the input mel features to 3000.
Do you have a suggestion as to how to get to the right shape? I usemax_len: 100
I guess you pass the wave vector to Whisper model as the original code passes it to WavLM. WavLM model accepts that but Whisper needs log mel spectrogram as the input. So you need to add feature extract code in WavLMLoss
class in losses.py
.
Here is a reference code for feature extraction. https://github.com/huggingface/transformers/blob/v4.40.2/src/transformers/models/whisper/feature_extraction_whisper.py
@Ziyueyork Can you please share your modified WavLMLoss
code?
@matmult it's not been solved in code yet but you may refer to this comment for a solution if you're experiencing this issue