Yingzhi WANG

Results 7 issues of Yingzhi WANG

Hi @mravanelli @TParcollet , this PR adds a typical CTC-wav2vec recipe to AISHELL-1. Test CER: 5.06% Dev CER: 4.52% Some points: 1. [chinese-wav2vec2-large](https://huggingface.co/TencentGameMate/chinese-wav2vec2-large) (from Tencent) is used which is pretrained...

Hi @sharvil @Andrechang @JCBrouwer thanks for this implementation. My issue is about the training time for unconditional generation. It takes me about 5 hours/ epoch on 1 * RTX8000 and...

This PR adds mel lengths into the outputs of FastSpeech2 inferences.

enhancement

Hi, Following #308, this PR adds support for the speech-emotion-diarization model. This `audio_diarization` interface can benefit other diarization tasks (emotion/speaker/vad/other speech events). For more details: [speechbrain/emotion-diarization-wavlm-large](https://huggingface.co/speechbrain/emotion-diarization-wavlm-large) Thanks!

Hi, first thanks for this awesome work. I'm trying to rewrite the training code for ltu-as while I find that the `cutoff_len` for stage 1 and 2 is 108 which...

question

Hi @mravanelli, here's the ltu-as PR as discussed. I am collecting several new datasets and will start a new round of training but this may take time, so meanwhile I...

enhancement

Hi, I tested and found that WavLLM can sometimes understand audio sounds too. Seeing that all the training data mentioned in the paper are speech-related, I just wonder where comes...