litellm [Feature]: Codestral from provider azure

The Feature

Hi!

I would love support for the /fim/completions endpoint when hosting Codestral through Azure AI. The endpoint like this: https://Codestral-2501-xxxxxx.models.ai.azure.com/v1/fim/completions

There seems to be hacks around it using model provider text-completion-codestral but I haven't gotten it to work.

I've tried both of these setups:

  - model_name: codestral-2501
    litellm_params:
      model: text-completion-codestral/codestral-2501
      api_key: os.environ/AZURE_AI_API_KEY
      api_base: https://Codestral-2501-xxxx.models.ai.azure.com/v1/fim/completions

and

  - model_name: codestral-2501
    litellm_params:
      model: azure_ai/codestral-2501
      api_key: os.environ/AZURE_AI_API_KEY
      api_base: https://Codestral-2501-xxxx.models.ai.azure.com/v1/fim/completions

Please advise on if this should work or if this is indeed a feature we need!

Motivation, pitch

This would enable usage of code autocompletion for Codestral hosted from Azure AI. A pretty common use case

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

Mar 14 '25 15:03 gustavhertz

This seems to be the workaround https://github.com/BerriAI/litellm/issues/5502

Linking this for reference. I am also interested in a solution.

Mar 17 '25 14:03 FabianHertwig

@FabianHertwig Thanks for the link. The suggested workaround seems to work for codestral-latest and codestral-2407 but not for codestral-2501 which is the model I'm interested in. I tried the linked suggestion without any success.

I am interested in seeing if anyone got a workaround to work :)

Mar 17 '25 14:03 gustavhertz

For me it works with this config and sending requests through Python, but I want to make it work with continue.dev, which fails so far.

Config:

  - model_name: codestral-latest
    litellm_params:
      model: text-completion-codestral/codestral-2501
      api_base: "https://Codestral-2501-xxxx.swedencentral.models.ai.azure.com/v1/fim/completions"
      api_key: os.environ/KEY_XXXX

Python script:

import requests
import logging
import os

logging.basicConfig(level=logging.INFO)


def get_models(api_base):
    models_url = f"{api_base.rstrip('/')}/models"
    params = {"return_wildcard_routes": "false"}

    try:
        models_response = requests.get(models_url, headers=headers, params=params)
        models_response.raise_for_status()
        logging.info("Models Status Code: %s", models_response.status_code)
        logging.info("Available Models: %s", models_response.text)
        return models_response.json()
    except requests.exceptions.RequestException as e:
        logging.error("Error fetching models: %s", e)
        return None


def callCodestral(api_base, prompt, suffix, max_tokens=50, temperature=0.7):
    url = f"{api_base.rstrip('/')}/v1/completions"
    payload = {
        "prompt": prompt,
        "suffix": suffix,
        "max_tokens": max_tokens,
        "temperature": temperature,
        "model": "codestral-latest",
    }

    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()  # Raises an HTTPError for bad responses
        logging.info("Status Code: %s", response.status_code)
        logging.info("Response Body: %s", response.text)
        return response.json()
    except requests.exceptions.RequestException as e:
        logging.error("An error occurred: %s", e)
        return None


def callGpt4o(api_base, prompt, max_tokens=50, temperature=0.7):
    url = f"{api_base.rstrip('/')}/v1/chat/completions"

    payload = {
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "max_tokens": max_tokens,
        "temperature": temperature,
        "model": "gpt-4o",
        "stream": False
    }

    try:
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()  # Raises an HTTPError for bad responses
        logging.info("Status Code: %s", response.status_code)
        logging.info("Response Body: %s", response.text)
        return response.json()
    except requests.exceptions.RequestException as e:
        logging.error("An error occurred: %s", e)
        return None




api_key = ""

api_base = "http://localhost:4000"
headers = {
    "Content-Type": "application/json",
    "x-goog-api-key": api_key
}

logging.info(headers)

get_models(api_base)
# callGpt4o(api_base, "def hello_world():\n    print('Hello, world!')\n\nhello_world()")
callCodestral(api_base, "def calculate_pi(iterations):", "    return result")

Mar 17 '25 15:03 FabianHertwig

I also want it to work for Continue, so let's try to make it work together! Ill try your script too.

Mar 17 '25 15:03 gustavhertz

Continue calls " url = f"{api_base.rstrip('/')}/fim/completions" not url = f"{api_base.rstrip('/')}/v1/completions" for autocomplete. (a bit depending on what provider you report in continue, but if you set it to Mistral it does)

That doesn't work with my setup.

Mar 17 '25 15:03 gustavhertz

Interesting! Makes sense, as the Mixtral API is https://api.mistral.ai/v1/fim/completions https://docs.mistral.ai/api/#tag/fim.

I wasn't able to debug the continue side yet and couldn't see what continue actually sends. How were you able to see it?

But it looks like LiteLLM actually needs the /v1/fim/completion endpoint to really be compatible.

The LiteLLM docs are also a bit confusing, as there is https://docs.litellm.ai/docs/providers/mistral and https://docs.litellm.ai/docs/providers/codestral and on both sites is the statement All models listed here https://docs.mistral.ai/platform/endpoints are supported. which is probably only true for the completion use case.

Mar 17 '25 16:03 FabianHertwig

I set up a local proxy between continue and LiteLLM to log all the requests and responses. This is what the request looks like:

{
  "timestamp": "2025-03-14T11:50:33.031801",
  "method": "POST",
  "path": "/fim/completions",
  "query_params": "",
  "headers": {
    "accept": "application/json",
    "accept-encoding": "gzip, deflate, br",
    "authorization": "REDACTED",
    "content-length": "3440",
    "content-type": "application/json",
    "user-agent": "node-fetch",
    "x-api-key": "XXXXX",
    "host": "0.0.0.0:8100",
    "connection": "close"
  },
  "body": {
    "model": "codestral-2501",
    "prompt": "STUFF",
    "suffix": "STUFF",
    "max_tokens": 4096,
    "temperature": 0.01,
    "stop": [
      "[PREFIX]",
      "[SUFFIX]",
      "\n+++++ ",
      "/src/",
      "#- coding: utf-8",
      "```"
    ],
    "stream": true
  }
}

Mar 17 '25 16:03 gustavhertz

The other option is to use a pass-through endpoint, but I struggled to get the keys to work correctly there.

Mar 18 '25 07:03 gustavhertz

I have been looking trough the codebase a bit.

There is definitely not a fim/completion endpoint. Here is where all the endpoints are defined. https://github.com/BerriAI/litellm/blob/b5c32c913bc10b56362a140794f964f4d0c57ba1/litellm/proxy/proxy_server.py#L3467-L3501

There seems to be an implementation for Codestral FIM here, but I didn't have enough time yet to understand it: litellm/llms/codestral/completion

Here is also something about FIM completion in the Vertex provider: https://github.com/BerriAI/litellm/blob/b5c32c913bc10b56362a140794f964f4d0c57ba1/litellm/llms/vertex_ai/vertex_ai_partner_models/main.py#L171

Maybe next week I have more time to take a closer look and maybe be able to contribute the endpoint implementation.

Mar 18 '25 09:03 FabianHertwig

Hi again. I share your view that the endpoint is not implemented. The codestral implementation in litellm/llms/codestral/completion is not reachable if I understand the code correctly, but I feel like I'm wrong and don't understand it well enough. I have to have time to work on it this week.

Mar 24 '25 20:03 gustavhertz

Hey @FabianHertwig I think I figured out how to call my Azure deployment of codestral through litellm. You are not gonna like it.

LiteLLM does not support the /fim/completions endpoint so what they've done to hack around this in the codestral case is to add an exception in the /completion endpoint when the model in the config starts with text-completion-codestral and then point that call to the base URL specified in the config.

This means that you can set up your config like this:

model_list:
  - model_name: codestral-latest
    litellm_params:
      model: text-completion-codestral/codestral-latest
      api_base: https://Codestral-2501-xxxxx.xxx.models.ai.azure.com/v1/fim/completions
      api_key: xxxxxx

And then you need to call litellm with the /completions endpoint like this:

url = "http://<litellm_url>/v1/completions"
headers = {
    'Authorization': 'Bearer sk-xxx',
    'Content-Type': 'application/json'
}

data = {
    "model": "codestral-latest",
    "prompt": "def multiply(a,b): ",
    "suffix": "return c",
    "max_tokens": 4096,
    "temperature": 0.01,
    "stop": [
        "[PREFIX]",
        "[SUFFIX]",
        "/src/",
        "#- coding: utf-8",
        "```"
    ],
}
response = requests.post(url, headers=headers, json=data)

I haven't found how to make continue call the /completions endpoint with a suffix parameter yet -> you probably need to find a special provider that also formats their requests like this.

Mar 28 '25 09:03 gustavhertz

@krrishdholakia @ishaan-jaff If you guys see this -> please implement the fim/completions endpoint :)

Mar 28 '25 09:03 gustavhertz

You can make Conitnue call the /completions endpoint with a suffix parameter if you set your continue config.yaml like this:

  - name: Autocompletion
    provider: siliconflow
    model: codestral-latest
    apiBase: https:/<litellm_api_base>/
    apiKey: sk-xxxxxx
    roles:
      - autocomplete

Trick is to set provider to siliconflow and then the call structure matches what LiteLLM expects which then is transformed to a /fim/completions call and then this is what the Codestral deployment expects.

Mar 28 '25 12:03 gustavhertz

Can confirm that it works. Thanks for testing that!

Still seems like a workaround. Ideally the /fim/completions endpoint would be implemented.

Apr 07 '25 13:04 FabianHertwig

@FabianHertwig Did you get the cost tracking in LiteLLM to work properly? I cannot seem to overwrite the 0$ default cost using this here:

    - model_name: codestral-latest
      litellm_params:
        model: text-completion-codestral/codestral-latest
        api_base: https://Codestral-xxxxx.swedencentral.models.ai.azure.com/v1/fim/completions
        api_key: os.environ/AZURE_AI_API_KEY
      model_info:
        input_cost_per_token: 0.0000003 # $0.3 per 1M input tokens
        output_cost_per_token: 0.0000009 # $0.9 per 1M output tokens

Apr 24 '25 13:04 gustavhertz

@FabianHertwig Did you get the cost tracking in LiteLLM to work properly? I cannot seem to overwrite the 0$ default cost using this here:

    - model_name: codestral-latest
      litellm_params:
        model: text-completion-codestral/codestral-latest
        api_base: https://Codestral-xxxxx.swedencentral.models.ai.azure.com/v1/fim/completions
        api_key: os.environ/AZURE_AI_API_KEY
      model_info:
        input_cost_per_token: 0.0000003 # $0.3 per 1M input tokens
        output_cost_per_token: 0.0000009 # $0.9 per 1M output tokens

Yes, it doesn't work :( Still 0$

Jun 13 '25 08:06 beng90

Any news about this ?

For my setup:

Codestral-2501 deployed in Azure AI Foundry
Added to LiteLLM proxy
Using LiteLLM URL and API_KEY in Continue.dev for autocomplete

The chat (/chat/completions endpoint) works well but none of the solution provided above seem to make the autocomplete (/fim/completions endpoint) work.

Setting siliconflow as provider calls the /chat/completions when triggering autocomplete in Continue.dev, which doesn't give helpful completions

Jul 08 '25 11:07 grosjeang

model_list:

model_name: codestral-latest litellm_params: model: text-completion-codestral/codestral-latest api_base: https://Codestral-2501-xxxxx.xxx.models.ai.azure.com/v1/fim/completions api_key: xxxxxx

"model": "codestral-latest", "prompt": "def multiply(a,b): ", "suffix": "return c", "max_tokens": 4096, "temperature": 0.01, "stop": [ "[PREFIX]", "[SUFFIX]", "/src/", "#- coding: utf-8", "```" ],

Thanks for that @gustavhertz !

I wasn't able to add the model through the UI, so I created a config.yaml, added your config lines (with official Mistral API base URL) and this request worked.

Now I will try to make continue works with it too.

EDIT: with latest continue version, it works out of the box!

Sep 20 '25 09:09 Squix

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Dec 20 '25 00:12 github-actions[bot]

[Feature]: Codestral from provider azure_ai

The Feature

Motivation, pitch

Are you a ML Ops Team?

Twitter / LinkedIn details