Deepgram Self-hosted
currently we have the limits 100 concurrencies for DG + 100 concurrencies soniox if we hit 10K user then we will need ~ 300 concurrencies. then if 100K ? yes 3000 concurrencies
--from Damien(DG team) Then additional 100 concurrencies are $10k on each tier If you self host Deepgram on your own infra there is no concurrency cost since you can scale as large as you like with your own GPUs We have docker images and helm charts if you want to deploy on Kubernetes etc https://developers.deepgram.com/docs/self-hosted-introduction We will need an MNDA in place to provide access and share benchmarks for different GPUs
hi a @thainguyensunya it's time 🌚
This sounds extra fascinating! I want to host it myself and any benchmarks or additional instructions are highly desired!
Great project, I admire that you started it open source and continue to do so!
so is this done or not?
@kodjima33 No, not yet. There are still some remaining work items in progress:
- Support for Deepgram self-hosted in the Backend: currently backend uses the Deepgram Cloud API by default. I have submitted a PR#1818 to add support for self-hosted Deepgram and waiting for review.
- Performance measurement
- Auto-scaling strategy for Deepgram self-hosted and its testing.
- Monitoring and alerting system for Deepgram self-hosted: This will be a separate task, but it must be completed before switching to Deepgram self-hosted in Production.
Tested Deepgram self-hosted (end point URL: https://dg.omiapi.com) with Backend Dev on Omi dev kit 2 and Omi Dev app. Transcription for audio streaming works as well as the native Deepgram Cloud. We will need more testings on performance aspect for Deepgram self-hosted.
that's a cool progress.
@thainguyensunya can you share more about your concerns here ?
@beastoin I am concerned about the auto-scaling strategy and performance of Deepgram self-hosted versus Deepgram Cloud. I am still waiting for suggestions from the Deepgram Account Representative regarding the auto-scaling configuration for our current hardware.
Once we have an optimal auto-scaling configuration, we should conduct testing to evaluate its effectiveness and measure performance. To support this testing, I have prepared Grafana dashboards for Deepgram self-hosted.
piloting....
the monitoring system is up
the first signal
it works well with en, but missing too many models.
rolling back...
next, contacting the Deepgram Team to get all models...
next, contacting the Deepgram Team to get all models...
Sorry to distract you with side questions, but I'm very curoius to know more about selfhosting solution you are working here. As I understood for Deepgram website, they only allow to selfhost their models for tons of money. I thought that to make selfhosting feasible for ordinary users, they'd have to use something like whisperX https://github.com/m-bain/whisperX
Or do you prepare self-hosting solution only for companies and businesses?
the current STT service we use is DeepGram(and, they're good), so let's start with DG self-hosted first.
basically, if you could self-host DG, it will not too hard to host another one.
anw, could you help with creating a new ticket for using any STTs, such as whipserX.
@GRbit
preparing for the 2nd piloting ...
wscat -c "wss://api.omi.me/v3/listen?language=bg&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ca&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh-TW&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh-HK&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=cs&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=da&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=nl&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=en&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=multi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=et&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=fi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=nl-BE&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=fr&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=de&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=de-CH&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=el&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=hi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=hu&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=id&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=it&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ja&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ko&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=lv&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=lt&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ms&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=no&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=pl&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=pt&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ro&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ru&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=sk&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=es&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=sv&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=th&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=tr&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=uk&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=vi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
All DG supported languages work.
The Deepgram self-hosted production-grade setup is now up and running. Our baseline configuration includes:
- 2 Deepgram Engine pods
- 2 Deepgram API pods
- 1 Deepgram License Proxy pod
With this setup, we can handle ~90 concurrent streaming STT requests, with auto-scaling enabled to accommodate increased demand. Since this is a self-hosted deployment, regular maintenance is required, including the following tasks:
- Updating Models
- Installing product updates (Deepgram containers via Helm charts)
- Updating configuration files
- Managing Deepgram licenses
- Backing up Deepgram components
- Renewing self-managed certificate before it expires
- Optimizing auto-scaling settings for performance and cost efficiency
This ensures the system remains stable, up to date, and cost-effective.