chatgpt-shell [Feature request] Using local Ollama models

This is neither a feature request nor a bug but hopefully others may find it useful.

I wanted to experiment with code refactoring using local models but still using the awesome chatgpt-shell. Here is how I got it to work:

;; your ollama endpoint
(setq chatgpt-shell-api-url-base "http://127.0.0.1:11434")

;; models you have pulled for use with ollama
(setq chatgpt-shell-model-versions
      '("gemma:2b-instruct"
        "zephry:latest"
        "codellama:instruct"
        "magicoder:7b-s-cl-q4_0"
        "starcoder:latest"
        "deepseek-coder:1.3b-instruct-q5_1"
        "qwen:1.8b"
        "mistral:7b-instruct"
        "orca-mini:7b"
        "orca-mini:3b"
        "openchat:7b-v3.5-q4_0"))

;; override how chatgpt-shell determines the context length
;; NOTE: use this as a template and adjust as needed
(defun chatgpt-shell--approximate-context-length (model messages)
  "Approximate the context length using MODEL and MESSAGES."
  (let* ((tokens-per-message)
         (max-tokens)
         (original-length (floor (/ (length messages) 2)))
         (context-length original-length))
    ;; Remove "ft:" from fine-tuned models and recognize as usual
    (setq model (string-remove-prefix "ft:" model))
    (cond
     ((string-prefix-p "starcoder" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-3-5
            max-tokens 4096))
     ((string-prefix-p "magicoder" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-3-5
            max-tokens 4096))
     ((string-prefix-p "gemma" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "openchat" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "codellama" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "zephyr" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "qwen" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "deepseek-coder" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "mistral" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     ((string-prefix-p "orca" model)
      (setq tokens-per-message 4
            ;; https://platform.openai.com/docs/models/gpt-4
            max-tokens 8192))
     (t
      (error "Don't know '%s', so can't approximate context length" model)))
    (while (> (chatgpt-shell--num-tokens-from-messages
               tokens-per-message messages)
              max-tokens)
      (setq messages (cdr messages)))
    (setq context-length (floor (/ (length messages) 2)))
    (unless (eq original-length context-length)
      (message "Warning: chatgpt-shell context clipped"))
    context-length))

I have found that the gemma models integrate the best with correct code formatting, etc, but your mileage may vary.

The majority of chatgpt-shell features work and you can even change models with C-c C-v.

Apr 01 '24 18:04 glenstarchman

Thanks for this Glen! This is impresive and great to see. I'd been meaning to create a higher-level abstraction that reuses more chatgpt-shell things maybe on top of shell-maker https://xenodium.com/a-shell-maker.

I've not had a chance to play with these models. I'm guessing they're also implementing OpenAI's API/schema, which would make reusing more things easier for chatgpt-shell.

Apr 04 '24 13:04 xenodium

Okay. I just got this package working with open-webui, which I really like as a wrapper for Ollama.

First thing I had to do was go to the settings for the current logged in user by clicking the top-right user bubble. Then click Settings > Account > API keys and set that key as your chatgpt-shell-openai-key. Then adapt this code to fit your Open WebUI instance:

(after! chatgpt-shell
  ;; your ollama endpoint
  (setq chatgpt-shell-api-url-base "http://wydrogen:3000"
        chatgpt-shell-api-url-path "/ollama/api/chat")

  ;; models you have pulled for use with ollama
  (setq chatgpt-shell-model-versions
        '("dolphin-mixtral:latest"
          "llama3:latest"
          "llava:13b"
          "gemma2:27b"
          "deepseek-coder-v2:latest"))

  (defvar chatgpt-shell-model-settings
    (list (cons "llama3:latest" '((max-tokens . 8192)))
          (cons "llava:13b" '((max-tokens . 8192)))
          (cons "gemma2:27b" '((max-tokens . 8192)))
          (cons "dolphin-mixtral:latest" '((max-tokens . 8192)))
          (cons "deepseek-coder-v2:latest" '((max-tokens . 8192)))))

  ;; Adapt the above function to our `chatgpt-shell-model-settings'
  (defun chatgpt-shell--approximate-context-length (model messages)
    "Approximate the context length using MODEL and MESSAGES."
    (let* ((tokens-per-message 4)
           (max-tokens)
           (original-length (floor (/ (length messages) 2)))
           (context-length original-length))
      (let ((settings (alist-get model chatgpt-shell-model-settings)))
        (setq max-tokens (alist-get 'max-tokens settings 4096)))
      (while (> (chatgpt-shell--num-tokens-from-messages
                 tokens-per-message messages)
                max-tokens)
        (setq messages (cdr messages)))
      (setq context-length (floor (/ (length messages) 2)))
      (unless (eq original-length context-length)
        (message "Warning: chatgpt-shell context clipped"))
      context-length))

  (defun chatgpt-shell--extract-chatgpt-response (json)
    "Extract ChatGPT response from JSON."
    (if (eq (type-of json) 'cons)
        (let-alist json ;; already parsed
          (or (or .delta.content
                  .message.content)
              .error.message
              ""))
      (if-let (parsed (shell-maker--json-parse-string json))
          (string-trim
           (let-alist parsed
             .message.content))
        (if-let (parsed-error (shell-maker--json-parse-string-filtering
                               json "^curl:.*\n?"))
            (let-alist parsed-error
              .error.message))))))

Jul 01 '24 03:07 LemonBreezes

I also like using this to remove the "ChatGPT" branding from the prompt:

(defun chatgpt-shell--prompt-pair ()
  "Return a pair with prompt and prompt-regexp."
  (cons
   (format "Ollama(%s)> " (chatgpt-shell--shell-info))
   (rx (seq bol "Ollama" (one-or-more (not (any "\n"))) ">" (or space "\n")))))

(eval '(setf (shell-maker-config-prompt chatgpt-shell--config)
             (car (chatgpt-shell--prompt-pair))))
(eval '(setf (shell-maker-config-prompt-regexp chatgpt-shell--config)
             (cdr (chatgpt-shell--prompt-pair))))

Jul 01 '24 05:07 LemonBreezes

This is really cool @LemonBreezes! Nice work.

I'm guessing since the LLM APIs are the same, most chatgpt-shell features work? Like chatgpt-shell-swap-system-prompt, chatgpt-shell-swap-model-version, and chatgpt-shell-prompt-compose?

Jul 01 '24 08:07 xenodium

This is really cool @LemonBreezes! Nice work.

I'm guessing since the LLM APIs are the same, most chatgpt-shell features work? Like chatgpt-shell-swap-system-prompt, chatgpt-shell-swap-model-version, and chatgpt-shell-prompt-compose?

Yup. Just tested and all of those work.

Jul 04 '24 00:07 LemonBreezes

Very cool! It's been a long while since I tried any of the offline alternatives. How was your experience setting up? What OS? Hardware specs? How's performance for ya?

Jul 04 '24 14:07 xenodium

Very cool! It's been a long while since I tried any of the offline alternatives. How was your experience setting up? What OS? Hardware specs? How's performance for ya?

Performance is quite good on these models I'm using:

  (setq chatgpt-shell-model-versions
        '("dolphin-mixtral:latest"
          "zephyr:latest"
          "llava:latest"
          "llama3:latest"
          "gemma2:27b"
          "deepseek-coder-v2:latest"))

It was also really easy to set up. I just ran

docker run -d -p 3000:8080 --gpus=all -e WEBUI_AUTH=False -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

and then I downloaded the models through the web ui. I do have a really fast computer though. I have a 3080 ti and a 1080 ti GPU, Ryzen 7950x CPU, and 192gb of RAM. The 7-8b models type really fast for me and the dolphin-mixtral 47b model types slower but is usable.

I like the privacy aspect of running the models locally because a lot of times I wanted to use ChatGPT but was paranoid to because humans literally read our chat transcripts to tune OpenAI's models.

Jul 04 '24 14:07 LemonBreezes

This is neither a feature request nor a bug but hopefully others may find it useful.

@glenstarchman I've just renamed as a feature request. Hope that's ok. At some point, I'd like to support different models.

Oct 04 '24 11:10 xenodium

I'd like to make Ollama support happen, but first need some base work. If still keen, please upvote to gauge interest https://github.com/xenodium/chatgpt-shell/issues/244

Nov 13 '24 19:11 xenodium

@glenstarchman i got an error, asking to set chatgpt-shell-openai-key. i can run it locally without providing an API though, for example,

curl http://127.0.0.1:11434/api/generate -d '
{  
"model": "llama3.2",  
"prompt": "Why is the blue sky blue?",  
"stream": false,
"options":{
  "num_thread": 8,
  "num_ctx": 2024
  }
}'

it returns an answer.

i tried to set chatgpt-shell-openai-key to an empty string and a 404 error is thrown.

Nov 17 '24 19:11 yitang

i tried to set chatgpt-shell-openai-key to an empty string and a 404 error is thrown.

The stable (main) branch doesn't currently lend itself to swapping models easily.

I'm working on it (just got a model by another provider working). Needs cleaning up and some usage to iron out issues. This work will make Ollama support way easier.

It's been quite a bit of work, but I think I'm getting close. If keen to see this through, please consider sponsoring this project.

Nov 17 '24 19:11 xenodium

https://github.com/xenodium/chatgpt-shell/commit/15e501844b8af35ae5c5d5b038f3bf51719afe6f adds a basic implementation for llama3.2 (or use version v2.0.6)

I'm an Ollama noob. Installed it today for the first time.

@glenstarchman @yitang @LemonBreezes @gavinhughes fancy giving it a try? choose model via M-x chatgpt-shell-swap-model.

Nov 21 '24 19:11 xenodium

Thanks. i tried it on 2.0.9, not able to start chatgpt-shell, after chatgpt-shell-swap-model, the error message for chatgpt-shell is

let*: Wrong number of arguments: shell-maker-start, 6

this is a simple configure.

(use-package
  chatgpt-shell1
  :ensure nil
  :load-path "~/Downloads/chatgpt-shell/")
(require 'chatgpt-shell)
(require 'chatgpt-shell-ollama)

;; your ollama endpoint
(setq chatgpt-shell-api-url-base "http://0.0.0.0:8080/"
      chatgpt-shell-api-url-path "/ollama/api/chat")

Nov 23 '24 11:11 yitang

Update shell-maker too please.

Nov 23 '24 11:11 xenodium

I managed to get it working but there's a bug in chatgpt-shell-swap-model.

the first time i ran it, it had only lama3.2:1b in the candidate list, so typing llama3 will select lama3.2:1b,
the 2nd time, only lama3.2 is in the candidate list, so llama3 select llama3.2

i don't think i have llama3.2:1b model installed, thus the 404 error below.

i suspect llama3.2 is the model that comes with ollama, so it makes sense to have llamba3.2 as the default model.

Nov 23 '24 13:11 yitang

the first time i ran it, it had only lama3.2:1b in the candidate list, so typing llama3 will select lama3.2:1b, the 2nd time, only lama3.2 is in the candidate list, so llama3 select llama3.2

I had some "smart" logic that was actually confusing. The swapping function doesn't show your current model as it would be redundant to swap to same model. Anyway, I've now removed that in https://github.com/xenodium/chatgpt-shell/commit/d9bf622cd87ff0287b8f806a126778bf9ee53c62 Hopefully that's more predicatable now. Btw, you'll need v2.0.10 for that.

Nov 23 '24 14:11 xenodium

that change makes sense for a dummy user like me :)

also you defaulted the llama3.2 model already, which i 'm not aware of, so i don't need to swap model. i thought the default would be openai stuff.

anyway, a minimal working example which uses llama3.2 by default is below.

(add-to-list 'load-path "~/Downloads/chatgpt-shell/")
(add-to-list 'load-path "~/Downloads/shell-maker")
(require 'chatgpt-shell)
(chatgpt-shell)

Thanks for the quick update.

Nov 23 '24 14:11 yitang

curl http://localhost:11434/api/tags returns the list of locally installed models; can this be used to get models list instead of manually/hardcoding model names?

(defun retrieve-json-payload ()    
  "Retrieve JSON payload from http://localhost:11434/api/tags"
   (let ((url "http://localhost:11434/api/tags"))
   (let ((command (format "curl -s %s" url)))
   (shell-command-to-string command))))

Nov 23 '24 17:11 sivaramn

Nice idea. Needs some infrastructure work to figure out when to call this in model-agnostic way. Mind filing a separate feature request to automatically populate ollama models from http://localhost:11434/api/tags?

Nov 23 '24 17:11 xenodium

Nice idea. Needs some infrastructure work to figure out when to call this in model-agnostic way. Mind filing a separate feature request to automatically populate ollama models from http://localhost:11434/api/tags?

sure, will do. Documentation for API was found here https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models

Nov 23 '24 17:11 sivaramn

with chatgpt-shell v2.0.10 and only these config entries, I get an error

(setq chatgpt-shell-ollama-api-url-base "http://172.x.x.x.:11434") (setq chatgpt-shell-models '("qwen2.5-coder:14b" "minicpm-v:latest" "llama3.2-vision:latest" ))

Debugger entered--Lisp error: (error "Could not find a model. Missing model setup?") signal(error ("Could not find a model. Missing model setup?")) error("Could not find a model. Missing model setup?") chatgpt-shell-model-version()

The above 2 entries were the only customization I did. Do I have to do model parameters customization for each ?

Nov 23 '24 18:11 sivaramn

I'll modify the error so is more descriptive, but for now you'll have to do something like:

For now, you'll have to set as:

(setq chatgpt-shell-models
  (list (chatgpt-shell-ollama-make-model
         :version "llama3.2"
         :token-width 4 ;; approx chars per token
         :context-window 8192)
        (chatgpt-shell-ollama-make-model
         :version "llama3.2:1b"
         :token-width 4 ;; approx chars per token
         :context-window 8192)
        (chatgpt-shell-ollama-make-model
         :version "gemma2:2b"
         :token-width 4 ;; approx chars per token
         :context-window 8192)))

If you do add more models, please try to contribute them to chatgpt-shell-ollama.el in a PR. I'm a noob to Ollama, so it'd be great to add models by folks who are actively using them.

Nov 23 '24 21:11 xenodium

with this change

(setq chatgpt-shell-models
 (list (chatgpt-shell-ollama-make-model
        :version "qwen2.5-coder:14b"
        :token-width 4 ;; approx chars per token
        :context-window 32768)
       (chatgpt-shell-ollama-make-model
        :version "llama3.2-vision:latest"
        :token-width 4 ;; approx chars per token
        :context-window 32768)
       (chatgpt-shell-ollama-make-model
        :version "minicpm-v:latest"
        :token-width 4 ;; approx chars per token
        :context-window 32768)))

I get

Ollama(qwen2.5-coder:14b/Programming)> who are you?
<shell-maker-end-of-prompt>

curl: (22) The requested URL returned error: 400
curl: (3) bad range in URL position 11:
messages:[role:system,role:user]
          ^

for 2.1.1 version on GNU Emacs 30.0.92 (build 2, x86_64-w64-mingw32) of 2024-10-30 running on win11. Strangely the above configuration works perfectly on wsl2 emacs

Nov 24 '24 17:11 sivaramn

I suspect it's an Ollama/installation issue, but try (setq shell-maker-logging t) and post content from chatgpt shell logs buffer just in case. There's a temp file with the request (post that too please). We can look further.

Nov 24 '24 18:11 xenodium

We now have initial implementations for Claude, Gemini, and Ollama. Gonna close this feature request as all the majority of the work to go multi-model is now completed and the mentioned models working.

For anyone who had been anticipating multi-model support, please consider sponsoring. There was quite a bit of work needed to get here.

multi-model-shell

Follow https://github.com/xenodium/chatgpt-shell/issues/253 for populating model list from API.

Nov 26 '24 19:11 xenodium

(setq shell-maker-logging t)

Async Command v2
(curl http://172.18.x.x:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data)
Stderr
curl: (22) The requested URL retu
Filter pending
nil
Filter output
{"error":"invalid character 'm' looking for beginning of value"}
Filter combined
{"error":"invalid character 'm' looking for beginning of value"}
Stderr
rned error: 400
curl: (3) bad range in URL position 11:
messages:[role:system,role:user]
          ^

Sentinel
Exit status: 3

the curl-data contents are

{"model":"qwen2.5-coder:14b","messages":[{"role":"system","content":"The user is a programmer with very limited time.\n                        You treat their time as precious. You do not repeat obvious things, including their query.\n                        You are as concise as possible in responses.\n                        You never apologize for confusions because it would waste their time.\n                        You use markdown liberally to structure responses.\n                        Always show code snippets in markdown blocks with language labels.\n                        Don't explain code snippets.\n                        Whenever you output updated code for the user, only show diffs, instead of entire snippets.\n# System info\n\n## OS details\n/usr/bin/bash: line 1: ver: command not found\n## Editor\nGNU Emacs 30.0.92 (build 2, x86_64-w64-mingw32)\n of 2024-10-30"},{"role":"user","content":"who are you?"}],"stream":true}

I see this /usr/bin/bash: line 1: ver: command not found in the file. My getenv SHELL is C:/msys64/usr/bin/bash.exe . can you make it check SHELL and use that if set? Assuming that's where the issue is.

Nov 27 '24 16:11 sivaramn

Deos this run on the command line? curl http://172.18.x.x:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data

Nov 27 '24 16:11 xenodium

Deos this run on the command line? curl http://172.18.x.x:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data

curl http://172.18.16.1:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data
{"model":"qwen2.5-coder:14b","created_at":"2024-11-27T16:43:59.7996511Z","message":{"role":"assistant","content":"I"},"done":false}
{"model":"qwen2.5-coder:14b","created_at":"2024-11-27T16:44:00.2362922Z","message":{"role":"assistant","content":" am"},"done":false}

on Msys bash C:/msys64/usr/bin/bash.exe

Nov 27 '24 16:11 sivaramn

chatgpt-shell chatgpt-shell copied to clipboard

[Feature request] Using local Ollama models

chatgpt-shell
chatgpt-shell copied to clipboard