omi icon indicating copy to clipboard operation
omi copied to clipboard

Deepgram Self-hosted

Open beastoin opened this issue 1 year ago • 20 comments

currently we have the limits 100 concurrencies for DG + 100 concurrencies soniox if we hit 10K user then we will need ~ 300 concurrencies. then if 100K ? yes 3000 concurrencies

--from Damien(DG team) Then additional 100 concurrencies are $10k on each tier If you self host Deepgram on your own infra there is no concurrency cost since you can scale as large as you like with your own GPUs We have docker images and helm charts if you want to deploy on Kubernetes etc https://developers.deepgram.com/docs/self-hosted-introduction We will need an MNDA in place to provide access and share benchmarks for different GPUs

beastoin avatar Dec 09 '24 02:12 beastoin

hi a @thainguyensunya it's time 🌚

beastoin avatar Jan 14 '25 02:01 beastoin

This sounds extra fascinating! I want to host it myself and any benchmarks or additional instructions are highly desired!

Great project, I admire that you started it open source and continue to do so!

GRbit avatar Jan 28 '25 16:01 GRbit

Image

beastoin avatar Feb 17 '25 11:02 beastoin

so is this done or not?

kodjima33 avatar Feb 18 '25 02:02 kodjima33

@kodjima33 No, not yet. There are still some remaining work items in progress:

  1. Support for Deepgram self-hosted in the Backend: currently backend uses the Deepgram Cloud API by default. I have submitted a PR#1818 to add support for self-hosted Deepgram and waiting for review.
  2. Performance measurement
  3. Auto-scaling strategy for Deepgram self-hosted and its testing.
  4. Monitoring and alerting system for Deepgram self-hosted: This will be a separate task, but it must be completed before switching to Deepgram self-hosted in Production.

thainguyensunya avatar Feb 18 '25 03:02 thainguyensunya

Tested Deepgram self-hosted (end point URL: https://dg.omiapi.com) with Backend Dev on Omi dev kit 2 and Omi Dev app. Transcription for audio streaming works as well as the native Deepgram Cloud. We will need more testings on performance aspect for Deepgram self-hosted.

Image Image Image

thainguyensunya avatar Feb 18 '25 07:02 thainguyensunya

that's a cool progress.

@thainguyensunya can you share more about your concerns here ?

beastoin avatar Feb 23 '25 02:02 beastoin

@beastoin I am concerned about the auto-scaling strategy and performance of Deepgram self-hosted versus Deepgram Cloud. I am still waiting for suggestions from the Deepgram Account Representative regarding the auto-scaling configuration for our current hardware.

Once we have an optimal auto-scaling configuration, we should conduct testing to evaluate its effectiveness and measure performance. To support this testing, I have prepared Grafana dashboards for Deepgram self-hosted.

thainguyensunya avatar Feb 23 '25 02:02 thainguyensunya

piloting....

beastoin avatar Mar 01 '25 03:03 beastoin

Image

beastoin avatar Mar 01 '25 03:03 beastoin

the monitoring system is up

beastoin avatar Mar 01 '25 03:03 beastoin

Image

the first signal

beastoin avatar Mar 01 '25 03:03 beastoin

it works well with en, but missing too many models.

beastoin avatar Mar 01 '25 03:03 beastoin

rolling back...

beastoin avatar Mar 01 '25 03:03 beastoin

next, contacting the Deepgram Team to get all models...

beastoin avatar Mar 01 '25 04:03 beastoin

next, contacting the Deepgram Team to get all models...

Sorry to distract you with side questions, but I'm very curoius to know more about selfhosting solution you are working here. As I understood for Deepgram website, they only allow to selfhost their models for tons of money. I thought that to make selfhosting feasible for ordinary users, they'd have to use something like whisperX https://github.com/m-bain/whisperX

Or do you prepare self-hosting solution only for companies and businesses?

GRbit avatar Mar 02 '25 17:03 GRbit

the current STT service we use is DeepGram(and, they're good), so let's start with DG self-hosted first.

basically, if you could self-host DG, it will not too hard to host another one.

anw, could you help with creating a new ticket for using any STTs, such as whipserX.

@GRbit

beastoin avatar Mar 04 '25 04:03 beastoin

preparing for the 2nd piloting ...

beastoin avatar Mar 04 '25 04:03 beastoin

wscat -c "wss://api.omi.me/v3/listen?language=bg&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ca&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh-TW&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=zh-HK&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=cs&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=da&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=nl&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=en&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=multi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=et&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=fi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=nl-BE&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=fr&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=de&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=de-CH&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=el&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=hi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=hu&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=id&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=it&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ja&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ko&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=lv&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=lt&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ms&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=no&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=pl&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=pt&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ro&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=ru&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=sk&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=es&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=sv&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=th&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=tr&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=uk&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'
wscat -c "wss://api.omi.me/v3/listen?language=vi&sample_rate=16000&codec=opus&uid=xyz&include_speech_profile=true&stt_service=soniox" -H 'authorization: REDACTED'

All DG supported languages work.

beastoin avatar Mar 04 '25 04:03 beastoin

The Deepgram self-hosted production-grade setup is now up and running. Our baseline configuration includes:

  • 2 Deepgram Engine pods
  • 2 Deepgram API pods
  • 1 Deepgram License Proxy pod

With this setup, we can handle ~90 concurrent streaming STT requests, with auto-scaling enabled to accommodate increased demand. Since this is a self-hosted deployment, regular maintenance is required, including the following tasks:

  • Updating Models
  • Installing product updates (Deepgram containers via Helm charts)
  • Updating configuration files
  • Managing Deepgram licenses
  • Backing up Deepgram components
  • Renewing self-managed certificate before it expires
  • Optimizing auto-scaling settings for performance and cost efficiency

This ensures the system remains stable, up to date, and cost-effective.

thainguyensunya avatar Mar 04 '25 07:03 thainguyensunya