cohere_client = cohere.ClientV2 returned "Resource not found error"
Describe the bug When deployed model Cohere-embed-v3-english, it was working to use the serverless endpoint to create embeddings but if use the SDK with code:
cohere_client = cohere.ClientV2, returned "Resource not found error"
If use code: cohere_client = cohere.Client, can get the embeddings:
Screenshots
@richardhu6079 which environment are you hitting the model at? Is this Azure?
@richardhu6079 can you try not appending /v1 to the baseURL?
@mkozakov yes, on Azure AI foundry model catelog.
I was able to repro the issue, talking to Azure
Hi @mkozakov, could you please let know any updates on this issue? Or any ETA for the fix? Thanks
I have encountered the same issue. Today, I am experimenting a 500 internal server error when using the v1 Client.
import cohere
def cohere_client_embed(text: str):
cohere_client = cohere.Client(
api_key=AI_SERVICES_API_KEY, base_url=f"{AI_SERVICES_ENDPOINT}/models")
response = cohere_client.embed(
model=EMBEDDING_MODEL_NAME,
texts=[text],
input_type="search_document",
batching=False
)
return response.embeddings[0]
Running this just outputs this error:
@richardhu6079 Azure are saying that you need to un-toggle this setting before deploying the model:
@joselo85 your issue sounds unrelated, can you please share your exact request so i can try to reproduce it (including the texts input?
@mkozakov sure!
As I mentioned, I'm using Cohere's library to send requests to a "Cohere-embed-v3-english" model deployed in Azure AI Services resource.
import cohere
def cohere_client_embed(text: str):
cohere_client = cohere.Client(
api_key=AI_SERVICES_API_KEY, base_url="https://cohere-dev-test.cognitiveservices.azure.com/models")
response = cohere_client.embed(
model="Cohere-embed-v3-english",
texts=[text],
input_type="search_document",
batching=False
)
return response.embeddings[0]
print(cohere_client_embed("Hello World!"))
What surprises me, is that this code above worked just fine until last Tuesday (14-05).
@richardhu6079 i confirmed that deploying Embed V4 without that toggle enabled works
@joselo85 I just deployed embed-english-v3 and was not able to reproduce your issue. Are you sure you're deploying with that flag set to off?
I am not sure. I will check. But what's strange is that same deployment with that same code was working perfectly fine, all of the sudden it started returning that 500 error. As soon as I test that setting I'll let you know.
I am not sure. I will check. But what's strange is that same deployment with that same code was working perfectly fine, all of the sudden it started returning that 500 error. As soon as I test that setting I'll let you know.
Azure folks are investigating it, hopefully we get to the bottom of why it started erroring. How did re-deploying go?
I am not sure. I will check. But what's strange is that same deployment with that same code was working perfectly fine, all of the sudden it started returning that 500 error. As soon as I test that setting I'll let you know.
Azure folks are investigating it, hopefully we get to the bottom of why it started erroring. How did re-deploying go?
I redeployed turning that feature off but it did not solve the 500 error. Currently, I am able to generate embeddings using the Azure AI Inference SDK
from azure.ai.inference import EmbeddingsClient
from azure.ai.inference.models import EmbeddingInputType
from azure.core.credentials import AzureKeyCredential
async def ai_inference_client_embed(text: str):
azure_credential = AzureKeyCredential(AI_SERVICES_API_KEY)
embeddings_client = EmbeddingsClient(
endpoint="https://cohere-dev-test.services.ai.azure.com/models", # This deployment has the feature turned off
credential=azure_credential)
try:
model_name = EMBEDDING_MODEL_NAME
response = embeddings_client.embed(
dimensions=1024,
model="Cohere-embed-v3-english",
input=[text],
input_type=EmbeddingInputType.DOCUMENT,
)
embeddings = response['data'][0]['embedding']
return embeddings
except Exception as e:
print(f"Error while generating embeddings: {e}")
raise
Hi @mkozakov is there any updates from Azure folks? Thanks
@richardhu6079 you should be unblocked by redeploying using their instructions right? I was able to go through the process successfully
For other people dealing with this it seems like there's two ways of deploying cohere on Azure that are quite different.
1. Serverless Endpoint
This was the default for us and only available when you have "Deploy models to Azure AI model inference service" within "Preview Features" set to OFF. This will spin up serverless endpoints that can be used with the cohere client directly
import cohere
cohere_async_client = cohere.AsyncClientV2(
api_key="API_KEY",
base_url="URL",
)
await cohere_async_client.embed(
model="embed-v4.0", # actual embedding model name
texts=["Hello, world!"],
input_type="search_query",
embedding_types=["float"],
)
This is very nice but unfortunately it has an incredibly low request per minute limit (note, it's not by tokens per minute).
But azure does not limit the number of projects you can make (with each having it's own serverless endpoint). This is a pain, but you can set up. some basic rotation around a list of N clients to not hit the rate liimit.
2. Global Endpoint
This is only available when you have "Deploy models to Azure AI model inference service" within "Preview Features" set to ON.
This will give you access to the Global Standard deployment type. Supposedly this has significantly higher rate limits.
Yet, if you go with this option you can no longer use the cohere client and will be forced to swap to the Azure client. The azure client does not support async and passing arguments such as dtypes is much more confusing (not even sure if this is possible).
from azure.ai.inference import EmbeddingsClient
from azure.core.credentials import AzureKeyCredential
model = EmbeddingsClient(
endpoint="GLOBAL_ENDPOINT",
credential=AzureKeyCredential(api_key),
model="GLOBAL_ENDPOINT_NAME",
)
response = model.embed(
input=["Hello world"],
)
Thank you @Filimoa!