icefall
icefall copied to clipboard
What are the different format of data inputs for zipformer/pretrain in SSL recipe? How can we obtain them?
How to run a streaming zipformer transducer model with my own dataset?
I would like to use Zipformer "state of the art in speech recognation" to train a lower frequency video recognition "Sign language" model based on the code provided in icefall/egs/librispeech/ASR/zipformer/train.py...
Hi, We trained a Zipformer model with approximately 20k hours of Hindi audio data, containing files ranging between 2-14 seconds. The test data consists of longer audio files with extended...
Borrowed a lot of code from https://github.com/JinZr/icefall/tree/dev/zipformer-xlstm
ValueError lilcom: Length of string was too short [extra info] When calling: MonoCut.load_features(args=(MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605,...
[MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research](https://arxiv.org/pdf/2406.18301) The above paper has just open-sourced a dataset for 15 languages and is available at...
Hi everyone, I'm trying to download and prepare reazonspeech but I got this error: 2024-07-01 15:17:59 (prepare.sh:37:main) Running prepare.sh 2024-07-01 15:17:59 (prepare.sh:39:main) dl_dir: /home/hoang/PycharmProjects/icefall/egs/reazonspeech/ASR/download 2024-07-01 15:17:59 (prepare.sh:42:main) Stage 0: Download...
With zipformer I can get good performance. Currently, when I decode on CPU one by one (not using batch), the memory cost will go to 2.5G. The token size is...
Still working in progress See also [Contextual Position Encoding: Learning to Count What's Important](https://arxiv.org/abs/2405.18719)