faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

Preserving pts time gap on the audio to maintain video/audio sync

Open dodysw2 opened this issue 2 years ago • 2 comments

For use case where whisper is used to transcribe video stream like mp4, sometimes the original stream (e.g RTMP) has network issues that results in pts (presentation timestamp) jumps. Currently on faster-whisper's load audio's ffmpeg pipeline, when demuxing from input, it eliminates this "jump", resulting in unsync transcription. That is, the resulting audio duration is shorter than the video.

One suggestion is to add resample async to the ffmpeg pipeline, which I have tried, and worked -- by converting to .wav using that ffmpeg cli command, before passing to whisper. Related issue mentioned here: https://stackoverflow.com/questions/52845150/use-ffmpeg-to-export-audios-with-gaps-filled . However this is an extra hoop that slows down transcription, and would be great if the same resampling done directly within faster-whisper, maybe as an option to transcribe().

Thanks.

dodysw2 avatar Apr 08 '23 14:04 dodysw2

Do you know where I can find a file with these time gaps so that I can try implementing a solution?

guillaumekln avatar Apr 08 '23 15:04 guillaumekln

Here's one sample (originally .ts file renamed to mp4 so it can be uploaded) https://user-images.githubusercontent.com/46476260/230732618-b7b499d7-8c02-4811-82aa-bfc99528c1c1.mp4

dodysw2 avatar Apr 08 '23 16:04 dodysw2