Yingzhi WANG issues

Results 7 issues of


                                            Yingzhi WANG

Add CTC recipe to AISHELL-1

Hi @mravanelli @TParcollet , this PR adds a typical CTC-wav2vec recipe to AISHELL-1. Test CER: 5.06% Dev CER: 4.52% Some points: 1. [chinese-wav2vec2-large](https://huggingface.co/TencentGameMate/chinese-wav2vec2-large) (from Tencent) is used which is pretrained...

Unconditional Generation Training Time

Hi @sharvil @Andrechang @JCBrouwer thanks for this implementation. My issue is about the training time for unconditional generation. It takes me about 5 hours/ epoch on 1 * RTX8000 and...

Return mel lengths from the FastSpeech2 inferencers

This PR adds mel lengths into the outputs of FastSpeech2 inferences.

enhancement

add SpeechBrain Speech-Emotion-Diarization interface

Hi, Following #308, this PR adds support for the speech-emotion-diarization model. This `audio_diarization` interface can benefit other diarization tasks (emotion/speaker/vad/other speech events). For more details: [speechbrain/emotion-diarization-wavlm-large](https://huggingface.co/speechbrain/emotion-diarization-wavlm-large) Thanks!

question on cutoff_len

Hi, first thanks for this awesome work. I'm trying to rewrite the training code for ltu-as while I find that the `cutoff_len` for stage 1 and 2 is 108 which...

question

Add recipe for audio/speech LLM (ltu-as with llama3)

Hi @mravanelli, here's the ltu-as PR as discussed. I am collecting several new datasets and will start a new round of training but this may take time, so meanwhile I...

enhancement

Why can WavLLM understand audio sounds as well?

Hi, I tested and found that WavLLM can sometimes understand audio sounds too. Seeing that all the training data mentioned in the paper are speech-related, I just wonder where comes...