action-transcription icon indicating copy to clipboard operation
action-transcription copied to clipboard

Run subtitle extraction and Whisper transcription steps in parallel

Open jakubadamw opened this issue 1 year ago • 0 comments

First, congratulations on your Bellingcat hackathon prize! This is a great tool serving an important purpose. 🙂

I had a look at the code and aside from what's already tracked by #3, I noticed the Whisper transcription and subtitle extraction steps are run in sequence. They could be extracted out into standalone jobs (with any necessary artifacts passed around with upload-artifact and download-artifact ), so that they run in parallel and the failure of one does not impede the other from succeeding (yes, that could be also achieved with continue-on-error, but parallelism will be useful on its own). Running a video through both jobs at the same time could be useful if one isn't sure which of the transcriptions (from YouTube or Whisper) are going to be better quality.

Another suggestion: one could move the script and requirements.txt into a subdirectory in .github so that they don't clutter the directory in which the results get stored. In fact, perhaps the results should be pushed to a separate branch altogether to keep things cleanly separated. Not a big deal, though, of course.

Please let me know if you are open to contributions, as I will be happy to do the above (if you think they're good ideas) myself. 🙂

Congrats again, and thank you!

jakubadamw avatar Sep 30 '22 11:09 jakubadamw