omi Deepgram Self-hosted

currently we have the limits 100 concurrencies for DG + 100 concurrencies soniox if we hit 10K user then we will need ~ 300 concurrencies. then if 100K ? yes 3000 concurrencies

--from Damien(DG team) Then additional 100 concurrencies are $10k on each tier If you self host Deepgram on your own infra there is no concurrency cost since you can scale as large as you like with your own GPUs We have docker images and helm charts if you want to deploy on Kubernetes etc https://developers.deepgram.com/docs/self-hosted-introduction We will need an MNDA in place to provide access and share benchmarks for different GPUs

Dec 09 '24 02:12 beastoin

hi a @thainguyensunya it's time 🌚

Jan 14 '25 02:01 beastoin

This sounds extra fascinating! I want to host it myself and any benchmarks or additional instructions are highly desired!

Great project, I admire that you started it open source and continue to do so!

Jan 28 '25 16:01 GRbit

Feb 17 '25 11:02 beastoin

so is this done or not?

Feb 18 '25 02:02 kodjima33

@kodjima33 No, not yet. There are still some remaining work items in progress:

Support for Deepgram self-hosted in the Backend: currently backend uses the Deepgram Cloud API by default. I have submitted a PR#1818 to add support for self-hosted Deepgram and waiting for review.
Performance measurement
Auto-scaling strategy for Deepgram self-hosted and its testing.
Monitoring and alerting system for Deepgram self-hosted: This will be a separate task, but it must be completed before switching to Deepgram self-hosted in Production.

Feb 18 '25 03:02 thainguyensunya

Tested Deepgram self-hosted (end point URL: https://dg.omiapi.com) with Backend Dev on Omi dev kit 2 and Omi Dev app. Transcription for audio streaming works as well as the native Deepgram Cloud. We will need more testings on performance aspect for Deepgram self-hosted.

Feb 18 '25 07:02 thainguyensunya

that's a cool progress.

@thainguyensunya can you share more about your concerns here ?

Feb 23 '25 02:02 beastoin

@beastoin I am concerned about the auto-scaling strategy and performance of Deepgram self-hosted versus Deepgram Cloud. I am still waiting for suggestions from the Deepgram Account Representative regarding the auto-scaling configuration for our current hardware.

Once we have an optimal auto-scaling configuration, we should conduct testing to evaluate its effectiveness and measure performance. To support this testing, I have prepared Grafana dashboards for Deepgram self-hosted.

Feb 23 '25 02:02 thainguyensunya

piloting....

Mar 01 '25 03:03 beastoin

the monitoring system is up

Mar 01 '25 03:03 beastoin

the first signal

Mar 01 '25 03:03 beastoin

it works well with en, but missing too many models.

Mar 01 '25 03:03 beastoin

rolling back...

Mar 01 '25 03:03 beastoin

next, contacting the Deepgram Team to get all models...

Mar 01 '25 04:03 beastoin

next, contacting the Deepgram Team to get all models...

Sorry to distract you with side questions, but I'm very curoius to know more about selfhosting solution you are working here. As I understood for Deepgram website, they only allow to selfhost their models for tons of money. I thought that to make selfhosting feasible for ordinary users, they'd have to use something like whisperX https://github.com/m-bain/whisperX

Or do you prepare self-hosting solution only for companies and businesses?

Mar 02 '25 17:03 GRbit

the current STT service we use is DeepGram(and, they're good), so let's start with DG self-hosted first.

basically, if you could self-host DG, it will not too hard to host another one.

anw, could you help with creating a new ticket for using any STTs, such as whipserX.

@GRbit

Mar 04 '25 04:03 beastoin

preparing for the 2nd piloting ...

Mar 04 '25 04:03 beastoin

wscat -c "wss://api.omi.me/v3/listen?language=bg&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ca&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh-TW&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh-HK&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=cs&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=da&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=nl&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=en&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=multi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=et&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=fi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=nl-BE&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=fr&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=de&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=de-CH&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=el&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=hi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=hu&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=id&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=it&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ja&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ko&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=lv&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=lt&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ms&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=no&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=pl&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=pt&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ro&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ru&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=sk&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=es&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=sv&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=th&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=tr&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=uk&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=vi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'

All DG supported languages work.

Mar 04 '25 04:03 beastoin

The Deepgram self-hosted production-grade setup is now up and running. Our baseline configuration includes:

2 Deepgram Engine pods
2 Deepgram API pods
1 Deepgram License Proxy pod

With this setup, we can handle ~90 concurrent streaming STT requests, with auto-scaling enabled to accommodate increased demand. Since this is a self-hosted deployment, regular maintenance is required, including the following tasks:

Updating Models
Installing product updates (Deepgram containers via Helm charts)
Updating configuration files
Managing Deepgram licenses
Backing up Deepgram components
Renewing self-managed certificate before it expires
Optimizing auto-scaling settings for performance and cost efficiency

This ensures the system remains stable, up to date, and cost-effective.

Mar 04 '25 07:03 thainguyensunya