agents icon indicating copy to clipboard operation
agents copied to clipboard

feat: add Cohere plugin for LiveKit Agents

Open darshankparmar opened this issue 2 months ago • 15 comments

darshankparmar avatar Dec 10 '25 16:12 darshankparmar

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Dec 10 '25 16:12 CLAassistant

Hello, and thank you for your PR! In most cases, if the API is quite similar to OpenAI's, we push new providers as additional function to OpenAI (e.g., recent PRs regarding OpenRouter or OVH) plugin. Could you update it to just be part of openai plugin?

Hormold avatar Dec 11 '25 01:12 Hormold

Hello, and thank you for your PR! In most cases, if the API is quite similar to OpenAI's, we push new providers as additional function to OpenAI (e.g., recent PRs regarding OpenRouter or OVH) plugin. Could you update it to just be part of openai plugin?

Hi @Hormold thanks for your message! I will. 😄

darshankparmar avatar Dec 11 '25 05:12 darshankparmar

Hey! Tested the Cohere integration and found two issues that need fixing before this can work properly with voice agents.

First, Cohere API returns 400: message must be at least 1 token long when there's no user message in the chat context. This happens in voice agents when generate_reply(instructions="...") is called without user input (like in on_enter() to greet the user). OpenAI handles this fine but Cohere doesn't... looks like Cohere requires at least one user message to generate a response.

Second, tool calling breaks with 400: schema 'type' must be a string. Array 'type' is unsupported for this model. Cohere's OpenAI-compatible API seems to have stricter JSON schema requirements than OpenAI - it doesn't accept union/array types that LiveKit generates for function tools. Need to figure out what schema format Cohere actually expects and adapt the tool serialization.

Hormold avatar Dec 11 '25 20:12 Hormold

Thanks for the feedback!

darshankparmar avatar Dec 12 '25 14:12 darshankparmar

I took a look at the first issue. One idea: we could add a check in the chat method (Cohere-only) to see if there’s at least one user message in the context. If not, we either auto-inject a small placeholder user message (e.g. “Hello”) so Cohere is happy, or just throw an exception instead. Not sure which direction makes more sense here, but this would at least guarantee we don’t hit that 400 on empty contexts.

darshankparmar avatar Dec 12 '25 14:12 darshankparmar

I dug into the second issue as well. Ended up fixing it by setting _strict_tool_schema=False for Cohere.

darshankparmar avatar Dec 12 '25 15:12 darshankparmar

I took a look at the first issue. One idea: we could add a check in the chat method (Cohere-only) to see if there’s at least one user message in the context. If not, we either auto-inject a small placeholder user message (e.g. “Hello”) so Cohere is happy, or just throw an exception instead. Not sure which direction makes more sense here, but this would at least guarantee we don’t hit that 400 on empty contexts.

@Hormold can you confirm?

darshankparmar avatar Dec 19 '25 10:12 darshankparmar

Adding a placeholder message should be fine. This is what we did with Gemini Realtime: Screenshot 2025-12-19 at 11 27 11

chenghao-mou avatar Dec 19 '25 11:12 chenghao-mou

@Hormold pushed the Cohere issues fix, PR is ready for review.

darshankparmar avatar Dec 21 '25 03:12 darshankparmar

Hey, I tested, and the PR looks good. One minor thing is a conflict here. Also, I encountered a couple of timeouts on Cohere responses.

Hormold avatar Dec 22 '25 22:12 Hormold

Merge break imports. Could you please add ChatMessage to imports again?

Hormold avatar Dec 22 '25 22:12 Hormold

Hey, I tested, and the PR looks good. One minor thing is a conflict here. Also, I encountered a couple of timeouts on Cohere responses.

Thanks for testing! Regarding the timeouts - Cohere API can have high latency (25+ seconds TTFT) which may cause timeout errors in real-time applications. Consider:

  1. For a small, fast model: command-r7b-12-2024
  2. For general purpose use: command-r-08-2024
  3. For more advanced capabilities: command-a-03-2025
  4. Increasing timeout values for production use
  5. Setting appropriate max_completion_tokens to reduce response time

I also updated the latest Cohere Command models (text generation).

darshankparmar avatar Dec 23 '25 04:12 darshankparmar

Thanks! Works great!

Hormold avatar Dec 23 '25 21:12 Hormold

Thanks! Works great!

Thanks for the feedback! Glad it's working well. Any final changes needed or ready to merge? 🚀

darshankparmar avatar Dec 24 '25 18:12 darshankparmar