cohere-python cohere_client = cohere.ClientV2 returned "Resource not found error"

Describe the bug When deployed model Cohere-embed-v3-english, it was working to use the serverless endpoint to create embeddings but if use the SDK with code:

cohere_client = cohere.ClientV2, returned "Resource not found error"

If use code: cohere_client = cohere.Client, can get the embeddings:

Screenshots

May 12 '25 18:05 richardhu6079

@richardhu6079 which environment are you hitting the model at? Is this Azure?

May 12 '25 18:05 mkozakov

@richardhu6079 can you try not appending /v1 to the baseURL?

May 12 '25 18:05 mkozakov

@mkozakov yes, on Azure AI foundry model catelog.

May 12 '25 18:05 richardhu6079

I was able to repro the issue, talking to Azure

May 12 '25 19:05 mkozakov

Hi @mkozakov, could you please let know any updates on this issue? Or any ETA for the fix? Thanks

May 15 '25 13:05 richardhu6079

I have encountered the same issue. Today, I am experimenting a 500 internal server error when using the v1 Client.

import cohere


def cohere_client_embed(text: str):
    cohere_client = cohere.Client(
        api_key=AI_SERVICES_API_KEY, base_url=f"{AI_SERVICES_ENDPOINT}/models")
    response = cohere_client.embed(
        model=EMBEDDING_MODEL_NAME,
        texts=[text],
        input_type="search_document",
        batching=False
    )
    return response.embeddings[0]

Running this just outputs this error:

May 15 '25 16:05 joselo85

@richardhu6079 Azure are saying that you need to un-toggle this setting before deploying the model:

May 15 '25 19:05 mkozakov

@joselo85 your issue sounds unrelated, can you please share your exact request so i can try to reproduce it (including the texts input?

May 15 '25 19:05 mkozakov

@mkozakov sure!

As I mentioned, I'm using Cohere's library to send requests to a "Cohere-embed-v3-english" model deployed in Azure AI Services resource.

import cohere


def cohere_client_embed(text: str):
    cohere_client = cohere.Client(
        api_key=AI_SERVICES_API_KEY, base_url="https://cohere-dev-test.cognitiveservices.azure.com/models")
    response = cohere_client.embed(
        model="Cohere-embed-v3-english",
        texts=[text],
        input_type="search_document",
        batching=False
    )
    return response.embeddings[0]

print(cohere_client_embed("Hello World!"))

What surprises me, is that this code above worked just fine until last Tuesday (14-05).

May 15 '25 21:05 joselo85

@richardhu6079 i confirmed that deploying Embed V4 without that toggle enabled works

May 15 '25 23:05 mkozakov

@joselo85 I just deployed embed-english-v3 and was not able to reproduce your issue. Are you sure you're deploying with that flag set to off?

May 16 '25 00:05 mkozakov

I am not sure. I will check. But what's strange is that same deployment with that same code was working perfectly fine, all of the sudden it started returning that 500 error. As soon as I test that setting I'll let you know.

May 16 '25 00:05 joselo85

I am not sure. I will check. But what's strange is that same deployment with that same code was working perfectly fine, all of the sudden it started returning that 500 error. As soon as I test that setting I'll let you know.

Azure folks are investigating it, hopefully we get to the bottom of why it started erroring. How did re-deploying go?

May 16 '25 21:05 mkozakov

I am not sure. I will check. But what's strange is that same deployment with that same code was working perfectly fine, all of the sudden it started returning that 500 error. As soon as I test that setting I'll let you know.

Azure folks are investigating it, hopefully we get to the bottom of why it started erroring. How did re-deploying go?

I redeployed turning that feature off but it did not solve the 500 error. Currently, I am able to generate embeddings using the Azure AI Inference SDK

from azure.ai.inference import EmbeddingsClient
from azure.ai.inference.models import EmbeddingInputType
from azure.core.credentials import AzureKeyCredential

async def ai_inference_client_embed(text: str):
    azure_credential = AzureKeyCredential(AI_SERVICES_API_KEY)
    embeddings_client = EmbeddingsClient(
        endpoint="https://cohere-dev-test.services.ai.azure.com/models", # This deployment has the feature turned off
        credential=azure_credential)
    try:
        model_name = EMBEDDING_MODEL_NAME
        response = embeddings_client.embed(
            dimensions=1024,
            model="Cohere-embed-v3-english",
            input=[text],
            input_type=EmbeddingInputType.DOCUMENT,
        )
        embeddings = response['data'][0]['embedding']
        return embeddings
    except Exception as e:
        print(f"Error while generating embeddings: {e}")
        raise

May 19 '25 14:05 joselo85

Hi @mkozakov is there any updates from Azure folks? Thanks

May 21 '25 16:05 richardhu6079

@richardhu6079 you should be unblocked by redeploying using their instructions right? I was able to go through the process successfully

May 22 '25 19:05 mkozakov

For other people dealing with this it seems like there's two ways of deploying cohere on Azure that are quite different.

1. Serverless Endpoint

This was the default for us and only available when you have "Deploy models to Azure AI model inference service" within "Preview Features" set to OFF. This will spin up serverless endpoints that can be used with the cohere client directly

import cohere

cohere_async_client = cohere.AsyncClientV2(
    api_key="API_KEY",
    base_url="URL",
)

await cohere_async_client.embed(
    model="embed-v4.0", # actual embedding model name
    texts=["Hello, world!"],
    input_type="search_query",
    embedding_types=["float"],
)

This is very nice but unfortunately it has an incredibly low request per minute limit (note, it's not by tokens per minute).

But azure does not limit the number of projects you can make (with each having it's own serverless endpoint). This is a pain, but you can set up. some basic rotation around a list of N clients to not hit the rate liimit.

2. Global Endpoint

This is only available when you have "Deploy models to Azure AI model inference service" within "Preview Features" set to ON. This will give you access to the Global Standard deployment type. Supposedly this has significantly higher rate limits.

Yet, if you go with this option you can no longer use the cohere client and will be forced to swap to the Azure client. The azure client does not support async and passing arguments such as dtypes is much more confusing (not even sure if this is possible).

from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential

model = EmbeddingsClient(
    endpoint="GLOBAL_ENDPOINT",
    credential=AzureKeyCredential(api_key),
    model="GLOBAL_ENDPOINT_NAME",
)

response = model.embed(
    input=["Hello world"],
)

May 25 '25 16:05 Filimoa

Thank you @Filimoa!

Jun 17 '25 00:06 mkozakov