litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Bug]: Proxy is converting /v1/completions endpoint to /v1/chat/completions data structure

Open bufferoverflow opened this issue 1 year ago • 2 comments

What happened?

Proxy is converting /v1/completions endpoint to /v1/chat/completions data structure

payload sent: curl -s --insecure http://0.0.0.0:8000/v1/completions -d '{ "prompt": "def print_hello_world():", "model": "starcoder2-3b"}' -H "Content-Type: application/json"

Using latest git version: a311788f0da7fb052499e14463c18d1c84e6d739

starting litellm with poetry run litellm -c local.yml --port 8000 --detailed_debug

---
model_list:
 - model_name: starcoder2-3b
    litellm_params:
      api_base: https://vllm.example.com/v1
      api_key: "os.environ/API_KEY"
      model: openai/starcoder2-3b
      stream_timeout: 5
    model_info:
      mode: completion
litellm_settings:
  drop_params: true
  num_retries: 3
  request_timeout: 20
  allowed_fails: 3

general_settings:
  background_health_checks: true
  health_check_interval: 300

Using the vllm OpenAI compatible API directly works smooth:

$ curl -s --insecure https://vllm.example.com/v1/completions -d '{ "prompt": "def print_hello_world():", "model": "starcoder2-3b"}'   -H "Content-Type: application/json"  | jq
{
  "id": "cmpl-00a6b5104c024c71bc0ca4b001946f95",
  "object": "text_completion",
  "created": 1712766147,
  "model": "starcoder2-3b",
  "choices": [
    {
      "index": 0,
      "text": "\n    print(\"Hello Python\")\n    print(\"Hello World\")\n\n\ndef main",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "total_tokens": 23,
    "completion_tokens": 16
  }
}

Relevant log output

18:16:37 - LiteLLM:INFO: utils.py:1112 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://vllm.example.com/v1/ \
-d '{'model': 'starcoder2-3b', 'messages': [{'role': 'user', 'content': 'def print_hello_world():'}], 'extra_body': {}}'

Twitter / LinkedIn details

No response

bufferoverflow avatar Apr 10 '24 16:04 bufferoverflow

Hi @bufferoverflow i believe the fix is behind this PR - https://github.com/BerriAI/litellm/pull/2709

Will update this ticket once it's out

krrishdholakia avatar Apr 10 '24 18:04 krrishdholakia

thanks @krrishdholakia I will give that PR a try and report back

bufferoverflow avatar Apr 10 '24 19:04 bufferoverflow

setting text-completion-openai within model definition did the trick, thanks @krrishdholakia !

  - model_name: starcoder2-3b
    litellm_params:
      api_base: https://vllm.example.com/v1
      api_key: THIS_IS_UNUSED
      model: text-completion-openai/starcoder2-3b
      stream_timeout: 5
    model_info:
      mode: completion
      metadata: >
        StarCoder2 trained with The Stack v2 dataset. More information:
        https://huggingface.co/bigcode/starcoder2-3b

bufferoverflow avatar Apr 15 '24 14:04 bufferoverflow