bmilde comments

Results 54 comments of


                                            bmilde

librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400 when using streaming transformer model inference on CPU

Sure: https://github.com/espnet/espnet/pull/4922

librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400 when using streaming transformer model inference on CPU

Hi @sangeet2020 , I've trained the German models for my speechcatcher ASR project: https://github.com/speechcatcher-asr You'll find anything related to obtaining the data and my Espnet recipe in https://github.com/speechcatcher-asr/speechcatcher-data (see Section...

librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400 when using streaming transformer model inference on CPU

@sangeet2020 The XL model was trained on about 30000 hours of speech data ;) I rented a 4x GPU machine with 4x 3090 on vast.ai for this project and then...

librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400 when using streaming transformer model inference on CPU

Btw, end-to-end punctuation worked better than expected with the speechcatcher models! Much better than my previous Kaldi setup where I did punctuation with an additional model in post-processing.

librosa.util.exceptions.ParameterError: Window size mismatch: 512 != 400 when using streaming transformer model inference on CPU

The speechcatcher model sizes I was referring too were all just different yaml confs, see https://github.com/speechcatcher-asr/espnet/tree/egs2-speechcatcher-de/egs2/speechcatcher/asr1/conf Would be cool if you could get the recipe running for other languages than...

Incomplete transcription for longer audios in streaming asr

If you're using asr.sh for training there's a parameter max_wav_duration that's set to a quite low default value imho (only 20 seconds). My understanding is that this cuts any training...

Add a new kind of alignment lattice in speechocean762

Hi, sorry for the long wait. Code looks good, thanks!

Severely degraded sound quality with file format "flac" in format_wav_scp.py, potential soundfile.write bug, clicking noises

Please don't remove the .flac feature, it's super useful on space constraint systems for large datasets! Maybe it is only some combination of package versions and Python version that triggers...

Severely degraded sound quality with file format "flac" in format_wav_scp.py, potential soundfile.write bug, clicking noises

The bug occurred on a GPU machine I rented through a cloud provider to train my models. It might have had an older version of libsndfile installed, here is more...

Stateful training doesn't seem to work

I got it running without errors by adding .clone() to: ``` mlstm_kernels/torch/chunkwise/native/fw.py, line 116, in mlstm_chunkwise__recurrent_fw_C vecN_k_next = scaGbar_k * vecN_k + matK_chunk_gated.transpose(-2, -1).sum(-1) ``` => `vecN_k_next = scaGbar_k *...