icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Training with disfluencies in speech

Open duhtapioca opened this issue 1 year ago • 0 comments

Hi

We're looking to finetune a zipformer streaming model on our custom dataset of around 100 hours that we are about to get manually annotated. The speech in that dataset may contain disfluencies. So, in this case, is it better to create the annotations with disfluencies or should we opt to ignore them in the transcripts?

From the CSJ experiments in #892, we infer that the model trained and tested on fluent transcripts is performing slightly better. Is this inference correct? In the case of zipformer, are we to expect similar results or is training with disfluent transcriptions worth a shot? If yes what would be the ideal format for annotating disfluent speech for zipformer?

Any advice on this would be of great help.

Thanks!

duhtapioca avatar Jul 24 '24 10:07 duhtapioca