continue Codestral response is suddenly 4 times slower in Continue when compared to others

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS:Linux 6.9
- GPU: Nvidia 4090 Mobile (16GB VRAM)
- Provider: Ollama
- Continue: 0.8.43
- IDE: VSCode
- Model: Codestral
- config.json:
  
{
  "models": [
    {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral:latest"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
      "title": "Codestral",
      "provider": "ollama",
      "model": "codestral:latest"
  },
  "allowAnonymousTelemetry": false,
  "embeddingsProvider": {
    "provider": "ollama",
    "model": "nomic-embed-text"
  }
}

Description

Codestral response via ollama has suddenly become very slow with latest updates, its at least 4 times slower when compared to response timings from other other apps like open-webui, curl are very fast. Other models like deepseek-coder-v2 is working fine in Continue.

To reproduce

Setup Continue to use Codestral for chat and tabautocomplete.
Watch the the logs for ollama e.g. in docker its docker logs --follow ollama
Open Continue chat and give any prompt.
Note response time in the ollama logs and notice the latency in the chat.
Give the same prompt in curl or in open-webui to Codestral and note the response time.

Log output

Note: Model is already loaded in VRAM before testing.

Ollama logs for Codestral via Continue

[GIN] 2024/07/30 - 10:45:28 | 200 |         1m33s |      172.17.0.1 | POST     "/api/chat"

Ollama logs for Codestral via open-webui

[GIN] 2024/07/30 - 10:46:32 | 200 |  5.142411188s |      172.17.0.1 | POST     "/api/chat"

Jul 30 '24 10:07 abishekmuthian

Issue seems to be related to context size https://github.com/continuedev/continue/issues/1776 , setting new session in chat makes Codestral usable but still not as fast as open-webui.

Jul 31 '24 12:07 abishekmuthian

Can confirm this. Codestral is extremly slow (one word per ~20 seconds) in Continue, while it is blazing fast using ollama's direct console. A striking difference

P.S. Yep, looks like adjusting context size fixes this.

Aug 18 '24 08:08 plashenkov

This issue hasn't been updated in 90 days and will be closed after an additional 10 days without activity. If it's still important, please leave a comment and share any new information that would help us address the issue.

Mar 03 '25 04:03 github-actions[bot]

This issue was closed because it wasn't updated for 10 days after being marked stale. If it's still important, please reopen + comment and we'll gladly take another look!

Mar 16 '25 02:03 github-actions[bot]