icefall issues

get data manifest for SSL recipe

What are the different format of data inputs for zipformer/pretrain in SSL recipe? How can we obtain them?

How to run a streaming zipformer transducer model with my own dataset?

Lower Frequency Video Recognition

I would like to use Zipformer "state of the art in speech recognation" to train a lower frequency video recognition "Sign language" model based on the code provided in icefall/egs/librispeech/ASR/zipformer/train.py...

kerolos

Zipformer performs well on short audios but terribly on long audios with silences

14

Hi, We trained a Zipformer model with approximately 20k hours of Hindi audio data, containing files ranging between 2-14 seconds. The test data consists of longer audio files with extended...

duhtapioca

Support LSTM decoder for Zipformer pruned RNN-T

Borrowed a lot of code from https://github.com/JinZr/icefall/tree/dev/zipformer-xlstm

yfyeung

lilcom error occurs when using kespeech training: lilcom: Length of string was too short

7

ValueError lilcom: Length of string was too short [extra info] When calling: MonoCut.load_features(args=(MonoCut(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0-904459_sp1.1', start=0.0, duration=6.0533125, channel=0, supervisions=[SupervisionSegment(id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', recording_id='KeSpeech_KeSpeech_000000352_1011480_9a450cb0_sp1.1', start=0.0, duration=6.0533125, channel=0, text='由市政府分管领导担任负责人', language='Chinese', speaker='KeSpeech_KeSpeech_000000352', gender=None, custom={'origin': 'aidatatang_200zh'}, alignment=None)], features=Features(type='kaldi-fbank', num_frames=605,...

kellkwang

[Help needed] Support https://huggingface.co/datasets/Alex-Song/MSR-86K

2

[MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research](https://arxiv.org/pdf/2406.18301) The above paper has just open-sourced a dataset for 15 languages and is available at...

csukuangfj

Cannot download reazonspeech dataset

4

Hi everyone, I'm trying to download and prepare reazonspeech but I got this error: 2024-07-01 15:17:59 (prepare.sh:37:main) Running prepare.sh 2024-07-01 15:17:59 (prepare.sh:39:main) dl_dir: /home/hoang/PycharmProjects/icefall/egs/reazonspeech/ASR/download 2024-07-01 15:17:59 (prepare.sh:42:main) Stage 0: Download...

hoangtm-aimesoft

How to reduce memory when decoding on CPU?

8

With zipformer I can get good performance. Currently, when I decode on CPU one by one (not using batch), the memory cost will go to 2.5G. The token size is...

tz301

WIP: Begin to add Contextual positional encoding

Still working in progress See also [Contextual Position Encoding: Learning to Count What's Important](https://arxiv.org/abs/2405.18719)

csukuangfj

icefall
icefall copied to clipboard

Metadata

get data manifest for SSL recipe

How to run a streaming zipformer transducer model with my own dataset?

Lower Frequency Video Recognition

Zipformer performs well on short audios but terribly on long audios with silences

Support LSTM decoder for Zipformer pruned RNN-T

lilcom error occurs when using kespeech training: lilcom: Length of string was too short

[Help needed] Support https://huggingface.co/datasets/Alex-Song/MSR-86K

Cannot download reazonspeech dataset

How to reduce memory when decoding on CPU?

WIP: Begin to add Contextual positional encoding

← Metadata

Owner

Metadata

icefall icefall copied to clipboard

Metadata

← Metadata

Owner

Metadata

icefall
icefall copied to clipboard