WhisperS2T icon indicating copy to clipboard operation
WhisperS2T copied to clipboard

Fix for small segments

Open Pranjalya opened this issue 2 years ago • 5 comments

Patch

  • Fix for small segments, when the audio duration is less than max_seg_len
  • Fallback for generate_segment_batched in case the seq_len and seq_metadata is not provided

Pranjalya avatar Apr 05 '24 11:04 Pranjalya

I like it!

BBC-Esq avatar May 25 '24 02:05 BBC-Esq

Great fix, without it WhisperS2T is useless for small duration audio.

HIGHLY recommend merging this pull request :)

Sembiance avatar Jun 12 '24 16:06 Sembiance

Hi @Pranjalya @Sembiance ! Can you describe here or link an issue related to small duration audio?

shashikg avatar Jul 06 '24 05:07 shashikg

Hey @shashikg, the issue was in the loop where we segment audio into parts and the case where the original audio's duration is < 1s. Using the range function and setting the end timestamp as int(audio_duration) will lead it to it being 0, which when used on range returns an empty list. Using a math.ceil function ensures that it is rounded up to the next ceiling integer and the audio segment timestamp is logged. This bug is potentially dangerous as well if someone is using indexing to map the audio segments, as it leads to missing of the parts.

Pranjalya avatar Sep 03 '24 01:09 Pranjalya

what will "max_seg_len" do?

LostnD avatar Nov 18 '24 16:11 LostnD