continue icon indicating copy to clipboard operation
continue copied to clipboard

Codestral response is suddenly 4 times slower in Continue when compared to others

Open abishekmuthian opened this issue 1 year ago • 1 comments

Before submitting your bug report

Relevant environment info

- OS:Linux 6.9
- GPU: Nvidia 4090 Mobile (16GB VRAM)
- Provider: Ollama
- Continue: 0.8.43
- IDE: VSCode
- Model: Codestral
- config.json:
  
{
  "models": [
    {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral:latest"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral:latest"
  },
  "allowAnonymousTelemetry": false,
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

Description

Codestral response via ollama has suddenly become very slow with latest updates, its at least 4 times slower when compared to response timings from other other apps like open-webui, curl are very fast. Other models like deepseek-coder-v2 is working fine in Continue.

To reproduce

  1. Setup Continue to use Codestral for chat and tabautocomplete.
  2. Watch the the logs for ollama e.g. in docker its docker logs --follow ollama
  3. Open Continue chat and give any prompt.
  4. Note response time in the ollama logs and notice the latency in the chat.
  5. Give the same prompt in curl or in open-webui to Codestral and note the response time.

Log output

Note: Model is already loaded in VRAM before testing.

Ollama logs for Codestral via Continue

[GIN] 2024/07/30 - 10:45:28 | 200 |         1m33s |      172.17.0.1 | POST     "/api/chat"

Ollama logs for Codestral via open-webui

[GIN] 2024/07/30 - 10:46:32 | 200 |  5.142411188s |      172.17.0.1 | POST     "/api/chat"

abishekmuthian avatar Jul 30 '24 10:07 abishekmuthian