bmilde
bmilde
Sure: https://github.com/espnet/espnet/pull/4922
Hi @sangeet2020 , I've trained the German models for my speechcatcher ASR project: https://github.com/speechcatcher-asr You'll find anything related to obtaining the data and my Espnet recipe in https://github.com/speechcatcher-asr/speechcatcher-data (see Section...
@sangeet2020 The XL model was trained on about 30000 hours of speech data ;) I rented a 4x GPU machine with 4x 3090 on vast.ai for this project and then...
Btw, end-to-end punctuation worked better than expected with the speechcatcher models! Much better than my previous Kaldi setup where I did punctuation with an additional model in post-processing.
The speechcatcher model sizes I was referring too were all just different yaml confs, see https://github.com/speechcatcher-asr/espnet/tree/egs2-speechcatcher-de/egs2/speechcatcher/asr1/conf Would be cool if you could get the recipe running for other languages than...
If you're using asr.sh for training there's a parameter max_wav_duration that's set to a quite low default value imho (only 20 seconds). My understanding is that this cuts any training...
Hi, sorry for the long wait. Code looks good, thanks!
Please don't remove the .flac feature, it's super useful on space constraint systems for large datasets! Maybe it is only some combination of package versions and Python version that triggers...
The bug occurred on a GPU machine I rented through a cloud provider to train my models. It might have had an older version of libsndfile installed, here is more...
I got it running without errors by adding .clone() to: ``` mlstm_kernels/torch/chunkwise/native/fw.py, line 116, in mlstm_chunkwise__recurrent_fw_C vecN_k_next = scaGbar_k * vecN_k + matK_chunk_gated.transpose(-2, -1).sum(-1) ``` => `vecN_k_next = scaGbar_k *...