WhisperS2T Fix for small segments

Patch

Fix for small segments, when the audio duration is less than max_seg_len
Fallback for generate_segment_batched in case the seq_len and seq_metadata is not provided

Apr 05 '24 11:04 Pranjalya

I like it!

May 25 '24 02:05 BBC-Esq

Great fix, without it WhisperS2T is useless for small duration audio.

HIGHLY recommend merging this pull request :)

Jun 12 '24 16:06 Sembiance

Hi @Pranjalya @Sembiance ! Can you describe here or link an issue related to small duration audio?

Jul 06 '24 05:07 shashikg

Hey @shashikg, the issue was in the loop where we segment audio into parts and the case where the original audio's duration is < 1s. Using the range function and setting the end timestamp as int(audio_duration) will lead it to it being 0, which when used on range returns an empty list. Using a math.ceil function ensures that it is rounded up to the next ceiling integer and the audio segment timestamp is logged. This bug is potentially dangerous as well if someone is using indexing to map the audio segments, as it leads to missing of the parts.

Sep 03 '24 01:09 Pranjalya

what will "max_seg_len" do?

Nov 18 '24 16:11 LostnD