[Feature]: Codestral from provider azure_ai
The Feature
Hi!
I would love support for the /fim/completions endpoint when hosting Codestral through Azure AI. The endpoint like this: https://Codestral-2501-xxxxxx.models.ai.azure.com/v1/fim/completions
There seems to be hacks around it using model provider text-completion-codestral but I haven't gotten it to work.
I've tried both of these setups:
- model_name: codestral-2501
litellm_params:
model: text-completion-codestral/codestral-2501
api_key: os.environ/AZURE_AI_API_KEY
api_base: https://Codestral-2501-xxxx.models.ai.azure.com/v1/fim/completions
and
- model_name: codestral-2501
litellm_params:
model: azure_ai/codestral-2501
api_key: os.environ/AZURE_AI_API_KEY
api_base: https://Codestral-2501-xxxx.models.ai.azure.com/v1/fim/completions
Please advise on if this should work or if this is indeed a feature we need!
Motivation, pitch
This would enable usage of code autocompletion for Codestral hosted from Azure AI. A pretty common use case
Are you a ML Ops Team?
No
Twitter / LinkedIn details
No response
This seems to be the workaround https://github.com/BerriAI/litellm/issues/5502
Linking this for reference. I am also interested in a solution.
@FabianHertwig Thanks for the link. The suggested workaround seems to work for codestral-latest and codestral-2407 but not for codestral-2501 which is the model I'm interested in. I tried the linked suggestion without any success.
I am interested in seeing if anyone got a workaround to work :)
For me it works with this config and sending requests through Python, but I want to make it work with continue.dev, which fails so far.
Config:
- model_name: codestral-latest
litellm_params:
model: text-completion-codestral/codestral-2501
api_base: "https://Codestral-2501-xxxx.swedencentral.models.ai.azure.com/v1/fim/completions"
api_key: os.environ/KEY_XXXX
Python script:
import requests
import logging
import os
logging.basicConfig(level=logging.INFO)
def get_models(api_base):
models_url = f"{api_base.rstrip('/')}/models"
params = {"return_wildcard_routes": "false"}
try:
models_response = requests.get(models_url, headers=headers, params=params)
models_response.raise_for_status()
logging.info("Models Status Code: %s", models_response.status_code)
logging.info("Available Models: %s", models_response.text)
return models_response.json()
except requests.exceptions.RequestException as e:
logging.error("Error fetching models: %s", e)
return None
def callCodestral(api_base, prompt, suffix, max_tokens=50, temperature=0.7):
url = f"{api_base.rstrip('/')}/v1/completions"
payload = {
"prompt": prompt,
"suffix": suffix,
"max_tokens": max_tokens,
"temperature": temperature,
"model": "codestral-latest",
}
try:
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status() # Raises an HTTPError for bad responses
logging.info("Status Code: %s", response.status_code)
logging.info("Response Body: %s", response.text)
return response.json()
except requests.exceptions.RequestException as e:
logging.error("An error occurred: %s", e)
return None
def callGpt4o(api_base, prompt, max_tokens=50, temperature=0.7):
url = f"{api_base.rstrip('/')}/v1/chat/completions"
payload = {
"messages": [
{
"role": "user",
"content": prompt
}
],
"max_tokens": max_tokens,
"temperature": temperature,
"model": "gpt-4o",
"stream": False
}
try:
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status() # Raises an HTTPError for bad responses
logging.info("Status Code: %s", response.status_code)
logging.info("Response Body: %s", response.text)
return response.json()
except requests.exceptions.RequestException as e:
logging.error("An error occurred: %s", e)
return None
api_key = ""
api_base = "http://localhost:4000"
headers = {
"Content-Type": "application/json",
"x-goog-api-key": api_key
}
logging.info(headers)
get_models(api_base)
# callGpt4o(api_base, "def hello_world():\n print('Hello, world!')\n\nhello_world()")
callCodestral(api_base, "def calculate_pi(iterations):", " return result")
I also want it to work for Continue, so let's try to make it work together! Ill try your script too.
Continue calls " url = f"{api_base.rstrip('/')}/fim/completions" not url = f"{api_base.rstrip('/')}/v1/completions" for autocomplete. (a bit depending on what provider you report in continue, but if you set it to Mistral it does)
That doesn't work with my setup.
Interesting! Makes sense, as the Mixtral API is https://api.mistral.ai/v1/fim/completions https://docs.mistral.ai/api/#tag/fim.
I wasn't able to debug the continue side yet and couldn't see what continue actually sends. How were you able to see it?
But it looks like LiteLLM actually needs the /v1/fim/completion endpoint to really be compatible.
The LiteLLM docs are also a bit confusing, as there is https://docs.litellm.ai/docs/providers/mistral and https://docs.litellm.ai/docs/providers/codestral and on both sites is the statement All models listed here https://docs.mistral.ai/platform/endpoints are supported. which is probably only true for the completion use case.
I set up a local proxy between continue and LiteLLM to log all the requests and responses. This is what the request looks like:
{
"timestamp": "2025-03-14T11:50:33.031801",
"method": "POST",
"path": "/fim/completions",
"query_params": "",
"headers": {
"accept": "application/json",
"accept-encoding": "gzip, deflate, br",
"authorization": "REDACTED",
"content-length": "3440",
"content-type": "application/json",
"user-agent": "node-fetch",
"x-api-key": "XXXXX",
"host": "0.0.0.0:8100",
"connection": "close"
},
"body": {
"model": "codestral-2501",
"prompt": "STUFF",
"suffix": "STUFF",
"max_tokens": 4096,
"temperature": 0.01,
"stop": [
"[PREFIX]",
"[SUFFIX]",
"\n+++++ ",
"/src/",
"#- coding: utf-8",
"```"
],
"stream": true
}
}
The other option is to use a pass-through endpoint, but I struggled to get the keys to work correctly there.
I have been looking trough the codebase a bit.
There is definitely not a fim/completion endpoint. Here is where all the endpoints are defined.
https://github.com/BerriAI/litellm/blob/b5c32c913bc10b56362a140794f964f4d0c57ba1/litellm/proxy/proxy_server.py#L3467-L3501
There seems to be an implementation for Codestral FIM here, but I didn't have enough time yet to understand it: litellm/llms/codestral/completion
Here is also something about FIM completion in the Vertex provider: https://github.com/BerriAI/litellm/blob/b5c32c913bc10b56362a140794f964f4d0c57ba1/litellm/llms/vertex_ai/vertex_ai_partner_models/main.py#L171
Maybe next week I have more time to take a closer look and maybe be able to contribute the endpoint implementation.
Hi again. I share your view that the endpoint is not implemented. The codestral implementation in litellm/llms/codestral/completion is not reachable if I understand the code correctly, but I feel like I'm wrong and don't understand it well enough. I have to have time to work on it this week.
Hey @FabianHertwig I think I figured out how to call my Azure deployment of codestral through litellm. You are not gonna like it.
LiteLLM does not support the /fim/completions endpoint so what they've done to hack around this in the codestral case is to add an exception in the /completion endpoint when the model in the config starts with text-completion-codestral and then point that call to the base URL specified in the config.
This means that you can set up your config like this:
model_list:
- model_name: codestral-latest
litellm_params:
model: text-completion-codestral/codestral-latest
api_base: https://Codestral-2501-xxxxx.xxx.models.ai.azure.com/v1/fim/completions
api_key: xxxxxx
And then you need to call litellm with the /completions endpoint like this:
url = "http://<litellm_url>/v1/completions"
headers = {
'Authorization': 'Bearer sk-xxx',
'Content-Type': 'application/json'
}
data = {
"model": "codestral-latest",
"prompt": "def multiply(a,b): ",
"suffix": "return c",
"max_tokens": 4096,
"temperature": 0.01,
"stop": [
"[PREFIX]",
"[SUFFIX]",
"/src/",
"#- coding: utf-8",
"```"
],
}
response = requests.post(url, headers=headers, json=data)
I haven't found how to make continue call the /completions endpoint with a suffix parameter yet -> you probably need to find a special provider that also formats their requests like this.
@krrishdholakia @ishaan-jaff If you guys see this -> please implement the fim/completions endpoint :)
You can make Conitnue call the /completions endpoint with a suffix parameter if you set your continue config.yaml like this:
- name: Autocompletion
provider: siliconflow
model: codestral-latest
apiBase: https:/<litellm_api_base>/
apiKey: sk-xxxxxx
roles:
- autocomplete
Trick is to set provider to siliconflow and then the call structure matches what LiteLLM expects which then is transformed to a /fim/completions call and then this is what the Codestral deployment expects.
Can confirm that it works. Thanks for testing that!
Still seems like a workaround. Ideally the /fim/completions endpoint would be implemented.
@FabianHertwig Did you get the cost tracking in LiteLLM to work properly? I cannot seem to overwrite the 0$ default cost using this here:
- model_name: codestral-latest
litellm_params:
model: text-completion-codestral/codestral-latest
api_base: https://Codestral-xxxxx.swedencentral.models.ai.azure.com/v1/fim/completions
api_key: os.environ/AZURE_AI_API_KEY
model_info:
input_cost_per_token: 0.0000003 # $0.3 per 1M input tokens
output_cost_per_token: 0.0000009 # $0.9 per 1M output tokens
@FabianHertwig Did you get the cost tracking in LiteLLM to work properly? I cannot seem to overwrite the 0$ default cost using this here:
- model_name: codestral-latest litellm_params: model: text-completion-codestral/codestral-latest api_base: https://Codestral-xxxxx.swedencentral.models.ai.azure.com/v1/fim/completions api_key: os.environ/AZURE_AI_API_KEY model_info: input_cost_per_token: 0.0000003 # $0.3 per 1M input tokens output_cost_per_token: 0.0000009 # $0.9 per 1M output tokens
Yes, it doesn't work :( Still 0$
Any news about this ?
For my setup:
- Codestral-2501 deployed in Azure AI Foundry
- Added to LiteLLM proxy
- Using LiteLLM URL and API_KEY in Continue.dev for autocomplete
The chat (/chat/completions endpoint) works well but none of the solution provided above seem to make the autocomplete (/fim/completions endpoint) work.
Setting siliconflow as provider calls the /chat/completions when triggering autocomplete in Continue.dev, which doesn't give helpful completions
model_list:
- model_name: codestral-latest litellm_params: model: text-completion-codestral/codestral-latest api_base: https://Codestral-2501-xxxxx.xxx.models.ai.azure.com/v1/fim/completions api_key: xxxxxx
"model": "codestral-latest", "prompt": "def multiply(a,b): ", "suffix": "return c", "max_tokens": 4096, "temperature": 0.01, "stop": [ "[PREFIX]", "[SUFFIX]", "/src/", "#- coding: utf-8", "```" ],
Thanks for that @gustavhertz !
I wasn't able to add the model through the UI, so I created a config.yaml, added your config lines (with official Mistral API base URL) and this request worked.
Now I will try to make continue works with it too.
EDIT: with latest continue version, it works out of the box!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.