agents icon indicating copy to clipboard operation
agents copied to clipboard

feat: Sarvam.ai plugin for STT and TTS

Open AnshTanwar opened this issue 7 months ago • 2 comments

Description

This PR adds a new plugin for Sarvam.ai, an AI platform specializing in high-quality speech recognition and synthesis for Indian languages.

Features

  • Integrates Sarvam's "Saarika" models for Speech-to-Text
  • Integrates Sarvam's "Bulbul" models for Text-to-Speech
  • Provides support for multiple Indian languages including Hindi and Indian English

Implementation

  • Added STT class with proper error handling and configuration options
  • Added TTS class with appropriate streaming capabilities

Why Sarvam.ai?

Sarvam.ai offers specialized models for Indian languages and accents that aren't well-supported by other providers, making this a valuable addition to the LiveKit ecosystem for applications targeting the Indian market.

AnshTanwar avatar May 08 '25 20:05 AnshTanwar

Reviewers please verify and suggest any issues or improvements @theomonnom @longcw @davidzhao

AnshTanwar avatar May 10 '25 07:05 AnshTanwar

thanks for the PR. we'll get this reviewed soon.

davidzhao avatar May 19 '25 05:05 davidzhao

@davidzhao anything we can do speed up this PR getting reviewed?

kurianbenoy-sarvam avatar May 23 '25 08:05 kurianbenoy-sarvam

@AnshTanwar Your default speaker that goes with your default model is not compatible:

livekit.agents._exceptions.APIConnectionError: Unexpected error in Sarvam TTS: Sarvam TTS API Error: {"error":{"message":"Speaker 'meera' is not compatible with model bulbul:v2. Available speakers for bulbul:v2 are: anushka, abhilash, manisha, vidya, arya, karun, hitesh","code":"invalid_request_error","request_id":"20250526_80e7edd3-cb58-4aba-9008-bb057c1fa1cd"}} (status_code=400, request_id=None, body=None) {"tts": "livekit.agents.tts.stream_adapter.StreamAdapter", "attempt": 1, "streamed": true, "pid": 4426, "job_id": "AJ_ryA2xN8bKUso"}

You need to change it.

You also forgot to mention conn_options in the prototype of the synthesize method. If you don't mention conn_options it raises type error. It did in my case.

joshiayush avatar May 26 '25 08:05 joshiayush

@AnshTanwar when is this going live? Super-excited to try with livekit!

apnatanmay avatar May 30 '25 12:05 apnatanmay

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 30 '25 19:05 CLAassistant

@joshiayush These issues are now resolved, thanks for pointing them.

AnshTanwar avatar May 30 '25 19:05 AnshTanwar

@AnshTanwar when is this going live? Super-excited to try with livekit!

Hopefully really soon, once @davidzhao reviews the PR

AnshTanwar avatar May 30 '25 20:05 AnshTanwar

First of all thanks @AnshTanwar for this PR.

  • I will recommend you to remove bulbul:v1 speakers and don't support it. As Bubul-v1 is deprecated from May 30, 2025 onwards.
  • Can we also add Saaras, which is our Speech to Text Translate models? (Which converts Indic Speech to English)

kurianbenoy-sarvam avatar May 31 '25 06:05 kurianbenoy-sarvam

sorry for the delays folks, and thanks for upstreaming with the latest. we'll get this reviewed in the next few days. looking forward to making this available!

davidzhao avatar May 31 '25 06:05 davidzhao

Would be extremely helpful for me as well

arpan-reconectai avatar May 31 '25 17:05 arpan-reconectai

:x: Invalid Changeset Format Detected

One or more changeset files in this PR have an invalid format. Please ensure they adhere to:

  • Start with --- and include a closing --- on its own line.
  • Each package line must be in the format: "package-name": patch|minor|major
  • No duplicate package entries allowed.
  • A non-empty change description must follow the front matter.

Error details: .github/next-release/changeset-08585fc3.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-1febe726.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-7feb6ffb.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-8cd717d7.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-96bdf598.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-b027afb9.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-b50551d3.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-c9392553.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-e1c5da38.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-e5f787fd.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-ec4e21ad.md: Failed to read file from git branch 'pr_head'. .github/next-release/changeset-f264cb01.md: Failed to read file from git branch 'pr_head'.

github-actions[bot] avatar May 31 '25 18:05 github-actions[bot]

Is this live already? @AnshTanwar @sumit-sarvam

apnatanmay avatar Jun 05 '25 08:06 apnatanmay

Hope this makes it into the next release. Really looking forward to trying it out!

vivek-apna avatar Jun 07 '25 16:06 vivek-apna

@AnshTanwar @kurianbenoy-sarvam @sumit-sarvam

any chance you guys could address this?

davidzhao avatar Jun 08 '25 00:06 davidzhao

Thanks @AnshTanwar for quickly resolving the issues.

@davidzhao can you please review it?

kurianbenoy-sarvam avatar Jun 08 '25 17:06 kurianbenoy-sarvam

Hi @theomonnom @davidzhao,

I'm Vinayak from Sarvam. We’d like to add documentation for our LiveKit integration for TTS and STT to your documentation site. Could you please let us know the steps required to make this happen?

We're also interested in collaborating with your DevRel team—could you let us know the right point of contact and how we can get in touch with them?

Thanks!

vinayak-sarvam avatar Jun 13 '25 05:06 vinayak-sarvam

When is livekit's upcoming release, and will this be shipped with that release?

arpan-reconectai avatar Jun 13 '25 06:06 arpan-reconectai