chatgpt-shell
chatgpt-shell copied to clipboard
[Feature request] Using local Ollama models
This is neither a feature request nor a bug but hopefully others may find it useful.
I wanted to experiment with code refactoring using local models but still using the awesome chatgpt-shell. Here is how I got it to work:
;; your ollama endpoint
(setq chatgpt-shell-api-url-base "http://127.0.0.1:11434")
;; models you have pulled for use with ollama
(setq chatgpt-shell-model-versions
'("gemma:2b-instruct"
"zephry:latest"
"codellama:instruct"
"magicoder:7b-s-cl-q4_0"
"starcoder:latest"
"deepseek-coder:1.3b-instruct-q5_1"
"qwen:1.8b"
"mistral:7b-instruct"
"orca-mini:7b"
"orca-mini:3b"
"openchat:7b-v3.5-q4_0"))
;; override how chatgpt-shell determines the context length
;; NOTE: use this as a template and adjust as needed
(defun chatgpt-shell--approximate-context-length (model messages)
"Approximate the context length using MODEL and MESSAGES."
(let* ((tokens-per-message)
(max-tokens)
(original-length (floor (/ (length messages) 2)))
(context-length original-length))
;; Remove "ft:" from fine-tuned models and recognize as usual
(setq model (string-remove-prefix "ft:" model))
(cond
((string-prefix-p "starcoder" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-3-5
max-tokens 4096))
((string-prefix-p "magicoder" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-3-5
max-tokens 4096))
((string-prefix-p "gemma" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "openchat" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "codellama" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "zephyr" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "qwen" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "deepseek-coder" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "mistral" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
((string-prefix-p "orca" model)
(setq tokens-per-message 4
;; https://platform.openai.com/docs/models/gpt-4
max-tokens 8192))
(t
(error "Don't know '%s', so can't approximate context length" model)))
(while (> (chatgpt-shell--num-tokens-from-messages
tokens-per-message messages)
max-tokens)
(setq messages (cdr messages)))
(setq context-length (floor (/ (length messages) 2)))
(unless (eq original-length context-length)
(message "Warning: chatgpt-shell context clipped"))
context-length))
I have found that the gemma
models integrate the best with correct code formatting, etc, but your mileage may vary.
The majority of chatgpt-shell features work and you can even change models with C-c C-v
.
Thanks for this Glen! This is impresive and great to see. I'd been meaning to create a higher-level abstraction that reuses more chatgpt-shell things maybe on top of shell-maker https://xenodium.com/a-shell-maker.
I've not had a chance to play with these models. I'm guessing they're also implementing OpenAI's API/schema, which would make reusing more things easier for chatgpt-shell.
Okay. I just got this package working with open-webui, which I really like as a wrapper for Ollama.
First thing I had to do was go to the settings for the current logged in user by clicking the top-right user bubble. Then click Settings > Account > API keys and set that key as your chatgpt-shell-openai-key
. Then adapt this code to fit your Open WebUI instance:
(after! chatgpt-shell
;; your ollama endpoint
(setq chatgpt-shell-api-url-base "http://wydrogen:3000"
chatgpt-shell-api-url-path "/ollama/api/chat")
;; models you have pulled for use with ollama
(setq chatgpt-shell-model-versions
'("dolphin-mixtral:latest"
"llama3:latest"
"llava:13b"
"gemma2:27b"
"deepseek-coder-v2:latest"))
(defvar chatgpt-shell-model-settings
(list (cons "llama3:latest" '((max-tokens . 8192)))
(cons "llava:13b" '((max-tokens . 8192)))
(cons "gemma2:27b" '((max-tokens . 8192)))
(cons "dolphin-mixtral:latest" '((max-tokens . 8192)))
(cons "deepseek-coder-v2:latest" '((max-tokens . 8192)))))
;; Adapt the above function to our `chatgpt-shell-model-settings'
(defun chatgpt-shell--approximate-context-length (model messages)
"Approximate the context length using MODEL and MESSAGES."
(let* ((tokens-per-message 4)
(max-tokens)
(original-length (floor (/ (length messages) 2)))
(context-length original-length))
(let ((settings (alist-get model chatgpt-shell-model-settings)))
(setq max-tokens (alist-get 'max-tokens settings 4096)))
(while (> (chatgpt-shell--num-tokens-from-messages
tokens-per-message messages)
max-tokens)
(setq messages (cdr messages)))
(setq context-length (floor (/ (length messages) 2)))
(unless (eq original-length context-length)
(message "Warning: chatgpt-shell context clipped"))
context-length))
(defun chatgpt-shell--extract-chatgpt-response (json)
"Extract ChatGPT response from JSON."
(if (eq (type-of json) 'cons)
(let-alist json ;; already parsed
(or (or .delta.content
.message.content)
.error.message
""))
(if-let (parsed (shell-maker--json-parse-string json))
(string-trim
(let-alist parsed
.message.content))
(if-let (parsed-error (shell-maker--json-parse-string-filtering
json "^curl:.*\n?"))
(let-alist parsed-error
.error.message))))))
I also like using this to remove the "ChatGPT" branding from the prompt:
(defun chatgpt-shell--prompt-pair ()
"Return a pair with prompt and prompt-regexp."
(cons
(format "Ollama(%s)> " (chatgpt-shell--shell-info))
(rx (seq bol "Ollama" (one-or-more (not (any "\n"))) ">" (or space "\n")))))
(eval '(setf (shell-maker-config-prompt chatgpt-shell--config)
(car (chatgpt-shell--prompt-pair))))
(eval '(setf (shell-maker-config-prompt-regexp chatgpt-shell--config)
(cdr (chatgpt-shell--prompt-pair))))
This is really cool @LemonBreezes! Nice work.
I'm guessing since the LLM APIs are the same, most chatgpt-shell features work? Like chatgpt-shell-swap-system-prompt, chatgpt-shell-swap-model-version, and chatgpt-shell-prompt-compose?
This is really cool @LemonBreezes! Nice work.
I'm guessing since the LLM APIs are the same, most chatgpt-shell features work? Like chatgpt-shell-swap-system-prompt, chatgpt-shell-swap-model-version, and chatgpt-shell-prompt-compose?
Yup. Just tested and all of those work.
Very cool! It's been a long while since I tried any of the offline alternatives. How was your experience setting up? What OS? Hardware specs? How's performance for ya?
Very cool! It's been a long while since I tried any of the offline alternatives. How was your experience setting up? What OS? Hardware specs? How's performance for ya?
Performance is quite good on these models I'm using:
(setq chatgpt-shell-model-versions
'("dolphin-mixtral:latest"
"zephyr:latest"
"llava:latest"
"llama3:latest"
"gemma2:27b"
"deepseek-coder-v2:latest"))
It was also really easy to set up. I just ran
docker run -d -p 3000:8080 --gpus=all -e WEBUI_AUTH=False -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
and then I downloaded the models through the web ui. I do have a really fast computer though. I have a 3080 ti and a 1080 ti GPU, Ryzen 7950x CPU, and 192gb of RAM. The 7-8b models type really fast for me and the dolphin-mixtral 47b model types slower but is usable.
I like the privacy aspect of running the models locally because a lot of times I wanted to use ChatGPT but was paranoid to because humans literally read our chat transcripts to tune OpenAI's models.
This is neither a feature request nor a bug but hopefully others may find it useful.
@glenstarchman I've just renamed as a feature request. Hope that's ok. At some point, I'd like to support different models.
I'd like to make Ollama support happen, but first need some base work. If still keen, please upvote to gauge interest https://github.com/xenodium/chatgpt-shell/issues/244
@glenstarchman i got an error, asking to set chatgpt-shell-openai-key. i can run it locally without providing an API though, for example,
curl http://127.0.0.1:11434/api/generate -d '
{
"model": "llama3.2",
"prompt": "Why is the blue sky blue?",
"stream": false,
"options":{
"num_thread": 8,
"num_ctx": 2024
}
}'
it returns an answer.
i tried to set chatgpt-shell-openai-key
to an empty string and a 404 error is thrown.
i tried to set chatgpt-shell-openai-key to an empty string and a 404 error is thrown.
The stable (main) branch doesn't currently lend itself to swapping models easily.
I'm working on it (just got a model by another provider working). Needs cleaning up and some usage to iron out issues. This work will make Ollama support way easier.
It's been quite a bit of work, but I think I'm getting close. If keen to see this through, please consider sponsoring this project.
https://github.com/xenodium/chatgpt-shell/commit/15e501844b8af35ae5c5d5b038f3bf51719afe6f adds a basic implementation for llama3.2 (or use version v2.0.6)
I'm an Ollama noob. Installed it today for the first time.
@glenstarchman @yitang @LemonBreezes @gavinhughes fancy giving it a try? choose model via M-x chatgpt-shell-swap-model
.
Thanks. i tried it on 2.0.9, not able to start chatgpt-shell, after chatgpt-shell-swap-model
, the error message for chatgpt-shell
is
let*: Wrong number of arguments: shell-maker-start, 6
this is a simple configure.
(use-package
chatgpt-shell1
:ensure nil
:load-path "~/Downloads/chatgpt-shell/")
(require 'chatgpt-shell)
(require 'chatgpt-shell-ollama)
;; your ollama endpoint
(setq chatgpt-shell-api-url-base "http://0.0.0.0:8080/"
chatgpt-shell-api-url-path "/ollama/api/chat")
Update shell-maker too please.
I managed to get it working but there's a bug in chatgpt-shell-swap-model
.
- the first time i ran it, it had only
lama3.2:1b
in the candidate list, so typingllama3
will selectlama3.2:1b
, - the 2nd time, only
lama3.2
is in the candidate list, sollama3
selectllama3.2
i don't think i have llama3.2:1b
model installed, thus the 404 error below.
i suspect llama3.2
is the model that comes with ollama, so it makes sense to have llamba3.2
as the default model.
the first time i ran it, it had only lama3.2:1b in the candidate list, so typing llama3 will select lama3.2:1b, the 2nd time, only lama3.2 is in the candidate list, so llama3 select llama3.2
I had some "smart" logic that was actually confusing. The swapping function doesn't show your current model as it would be redundant to swap to same model. Anyway, I've now removed that in https://github.com/xenodium/chatgpt-shell/commit/d9bf622cd87ff0287b8f806a126778bf9ee53c62 Hopefully that's more predicatable now. Btw, you'll need v2.0.10 for that.
that change makes sense for a dummy user like me :)
also you defaulted the llama3.2 model already, which i 'm not aware of, so i don't need to swap model. i thought the default would be openai stuff.
anyway, a minimal working example which uses llama3.2
by default is below.
(add-to-list 'load-path "~/Downloads/chatgpt-shell/")
(add-to-list 'load-path "~/Downloads/shell-maker")
(require 'chatgpt-shell)
(chatgpt-shell)
Thanks for the quick update.
curl http://localhost:11434/api/tags
returns the list of locally installed models; can this be used to get models list instead of manually/hardcoding model names?
(defun retrieve-json-payload ()
"Retrieve JSON payload from http://localhost:11434/api/tags"
(let ((url "http://localhost:11434/api/tags"))
(let ((command (format "curl -s %s" url)))
(shell-command-to-string command))))
Nice idea. Needs some infrastructure work to figure out when to call this in model-agnostic way. Mind filing a separate feature request to automatically populate ollama models from http://localhost:11434/api/tags?
Nice idea. Needs some infrastructure work to figure out when to call this in model-agnostic way. Mind filing a separate feature request to automatically populate ollama models from http://localhost:11434/api/tags?
sure, will do. Documentation for API was found here https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models
with chatgpt-shell v2.0.10
and only these config entries, I get an error
(setq chatgpt-shell-ollama-api-url-base "http://172.x.x.x.:11434")
(setq chatgpt-shell-models '("qwen2.5-coder:14b" "minicpm-v:latest" "llama3.2-vision:latest" ))
Debugger entered--Lisp error: (error "Could not find a model. Missing model setup?") signal(error ("Could not find a model. Missing model setup?")) error("Could not find a model. Missing model setup?") chatgpt-shell-model-version()
The above 2 entries were the only customization I did. Do I have to do model parameters customization for each ?
I'll modify the error so is more descriptive, but for now you'll have to do something like:
For now, you'll have to set as:
(setq chatgpt-shell-models
(list (chatgpt-shell-ollama-make-model
:version "llama3.2"
:token-width 4 ;; approx chars per token
:context-window 8192)
(chatgpt-shell-ollama-make-model
:version "llama3.2:1b"
:token-width 4 ;; approx chars per token
:context-window 8192)
(chatgpt-shell-ollama-make-model
:version "gemma2:2b"
:token-width 4 ;; approx chars per token
:context-window 8192)))
If you do add more models, please try to contribute them to chatgpt-shell-ollama.el in a PR. I'm a noob to Ollama, so it'd be great to add models by folks who are actively using them.
with this change
(setq chatgpt-shell-models
(list (chatgpt-shell-ollama-make-model
:version "qwen2.5-coder:14b"
:token-width 4 ;; approx chars per token
:context-window 32768)
(chatgpt-shell-ollama-make-model
:version "llama3.2-vision:latest"
:token-width 4 ;; approx chars per token
:context-window 32768)
(chatgpt-shell-ollama-make-model
:version "minicpm-v:latest"
:token-width 4 ;; approx chars per token
:context-window 32768)))
I get
Ollama(qwen2.5-coder:14b/Programming)> who are you?
<shell-maker-end-of-prompt>
curl: (22) The requested URL returned error: 400
curl: (3) bad range in URL position 11:
messages:[role:system,role:user]
^
for 2.1.1
version on GNU Emacs 30.0.92 (build 2, x86_64-w64-mingw32) of 2024-10-30
running on win11. Strangely the above configuration works perfectly on wsl2 emacs
I suspect it's an Ollama/installation issue, but try (setq shell-maker-logging t)
and post content from chatgpt shell logs buffer just in case. There's a temp file with the request (post that too please). We can look further.
We now have initial implementations for Claude, Gemini, and Ollama. Gonna close this feature request as all the majority of the work to go multi-model is now completed and the mentioned models working.
For anyone who had been anticipating multi-model support, please consider sponsoring. There was quite a bit of work needed to get here.
Follow https://github.com/xenodium/chatgpt-shell/issues/253 for populating model list from API.
(setq shell-maker-logging t)
Async Command v2
(curl http://172.18.x.x:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data)
Stderr
curl: (22) The requested URL retu
Filter pending
nil
Filter output
{"error":"invalid character 'm' looking for beginning of value"}
Filter combined
{"error":"invalid character 'm' looking for beginning of value"}
Stderr
rned error: 400
curl: (3) bad range in URL position 11:
messages:[role:system,role:user]
^
Sentinel
Exit status: 3
the curl-data
contents are
{"model":"qwen2.5-coder:14b","messages":[{"role":"system","content":"The user is a programmer with very limited time.\n You treat their time as precious. You do not repeat obvious things, including their query.\n You are as concise as possible in responses.\n You never apologize for confusions because it would waste their time.\n You use markdown liberally to structure responses.\n Always show code snippets in markdown blocks with language labels.\n Don't explain code snippets.\n Whenever you output updated code for the user, only show diffs, instead of entire snippets.\n# System info\n\n## OS details\n/usr/bin/bash: line 1: ver: command not found\n## Editor\nGNU Emacs 30.0.92 (build 2, x86_64-w64-mingw32)\n of 2024-10-30"},{"role":"user","content":"who are you?"}],"stream":true}
I see this /usr/bin/bash: line 1: ver: command not found
in the file. My getenv SHELL
is C:/msys64/usr/bin/bash.exe
. can you make it check SHELL
and use that if set? Assuming that's where the issue is.
Deos this run on the command line? curl http://172.18.x.x:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data
Deos this run on the command line?
curl http://172.18.x.x:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data
curl http://172.18.16.1:11434/api/chat --fail-with-body --no-progress-meter -m 600 -d @c:/Users/sivar/AppData/Local/Temp/shell-maker/curl-data
{"model":"qwen2.5-coder:14b","created_at":"2024-11-27T16:43:59.7996511Z","message":{"role":"assistant","content":"I"},"done":false}
{"model":"qwen2.5-coder:14b","created_at":"2024-11-27T16:44:00.2362922Z","message":{"role":"assistant","content":" am"},"done":false}
on Msys bash C:/msys64/usr/bin/bash.exe