icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Results 279 icefall issues
Sort by recently updated
recently updated
newest added

What are the different format of data inputs for zipformer/pretrain in SSL recipe? How can we obtain them?

How to run a streaming zipformer transducer model with my own dataset?

I would like to use Zipformer "state of the art in speech recognation" to train a lower frequency video recognition "Sign language" model based on the code provided in icefall/egs/librispeech/ASR/zipformer/train.py...

Hi, We trained a Zipformer model with approximately 20k hours of Hindi audio data, containing files ranging between 2-14 seconds. The test data consists of longer audio files with extended...

Borrowed a lot of code from https://github.com/JinZr/icefall/tree/dev/zipformer-xlstm

ValueError lilcom: Length of string was too short [extra info] When calling: MonoCut.load_features(args=(MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605,...

[MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research](https://arxiv.org/pdf/2406.18301) The above paper has just open-sourced a dataset for 15 languages and is available at...

Hi everyone, I'm trying to download and prepare reazonspeech but I got this error: 2024-07-01 15:17:59 (prepare.sh:37:main) Running prepare.sh 2024-07-01 15:17:59 (prepare.sh:39:main) dl_dir: /home/hoang/PycharmProjects/icefall/egs/reazonspeech/ASR/download 2024-07-01 15:17:59 (prepare.sh:42:main) Stage 0: Download...

With zipformer I can get good performance. Currently, when I decode on CPU one by one (not using batch), the memory cost will go to 2.5G. The token size is...

Still working in progress See also [Contextual Position Encoding: Learning to Count What's Important](https://arxiv.org/abs/2405.18719)