whisper-diarization icon indicating copy to clipboard operation
whisper-diarization copied to clipboard

Is this repo usable for a production use case!!

Open utility-aagrawal opened this issue 1 year ago • 9 comments

Hi All,

I am wondering if anyone has used this repo for a production use case. Currently, I am using openai whisper for transcription but want to include speaker diarization now. I have tried pyannote in the past but results from this repo look much better. My concern is that the source code hasn't been written keeping a production use case in mind - not too flexible, too many log messages, etc. I can rewrite this code but what if there were updates in the future. Will appreciate the community's input on this. Thanks!

utility-aagrawal avatar Jan 23 '24 16:01 utility-aagrawal

@MahmoudAshraf97 , will appreciate your take on this! Thanks for sharing your work!

utility-aagrawal avatar Jan 23 '24 16:01 utility-aagrawal

Hello and thanks for the input، please open a PR with any changes you see that are useful and we can discuss them together

MahmoudAshraf97 avatar Jan 24 '24 13:01 MahmoudAshraf97

@MahmoudAshraf97 , Thanks for your understanding! This is what I want to do:

  1. Leave existing functionalities as-is.

  2. Please see the attached .txt file. Currently, a lot of messages/warnings/logs are displayed in command line, I want to make this optional where users can choose if they want to see these messages. whisper_diarization_stdout.txt

  3. If users want, they should be able to run the whole pipeline locally. Meaning that they can download all the models in a directory beforehand. Faster-whisper and whisperX load_align_model already have support for this. I can check if other models can also be used in this way. Do you know if this is feasible? What other models are used in this pipeline? I still have to go through the code and don't have this answer yet.

  4. Format the code for readability and usability.

Let me know what you think. It will take some time to make all these changes. Before I spend any time, I wanted to align with you. Thanks!

utility-aagrawal avatar Jan 25 '24 20:01 utility-aagrawal

@MahmoudAshraf97 , do you have any feedback?

utility-aagrawal avatar Feb 02 '24 20:02 utility-aagrawal

@MahmoudAshraf97 , thought?

utility-aagrawal avatar Feb 28 '24 16:02 utility-aagrawal

I'm not speaking for @MahmoudAshraf97 here, but if you take a look at his response from Jan 24, it's pretty clear. This is an open source project that he's doing for whatever his reasons are. @utility-aagrawal, you are treating it like a commercial product that you are paying for.

If you want these changes, you are free to implement them and submit the PR's to get them merged into the project. If you are not a developer, you could pay someone to do the work and submit the patches.

aedocw avatar May 08 '24 19:05 aedocw

I have this running in a production environment - it’s stable, consistent, and does a great job

transcriptionstream avatar May 08 '24 23:05 transcriptionstream

@transcriptionstream could you please explain how/where you deployed it? i am searching for the cheapest way to launch this transcription service in production (as an api). Read in reddit that Modal is great option

cristobal-larach avatar Sep 02 '24 23:09 cristobal-larach

@cristobal-larach Transcription Stream is usually deployed onsite via virtual machine. On average, our clients process around 3000 hours of audio in roughly 40,000 recordings a month. Reach out via the contact form if you'd like some direction on implimenting whisper-diarization or the community version of Transcription Stream.

transcriptionstream avatar Sep 03 '24 00:09 transcriptionstream