gptel Contact to llamafile-AI on server fails

I have two AIs set up, one on my laptop and one on my desktop:

(use-package gptel
  :config
  (gptel-make-openai "testai"          ;Any name
    :stream t                             ;Stream responses
    :protocol "http"
    :host "localhost:8080"                ;Llama.cpp server location
    :models '("test")
    :key nil)

  (gptel-make-openai "desktop"          ;Any name
    :stream t                             ;Stream responses
    :protocol "http"
    :host "1.0.0.8:8080"                ;Llama.cpp server location
    :models '("test")
    :key nil)

  ;; (setq-default
  ;;  gptel-model   "test"
  ;;  gptel-backend (gptel-make-openai "testai"
  ;;                  :stream t
  ;;                  :protocol "http"
  ;;                  :host "localhost:8080"
  ;;                  :models '("test")))

  (setq-default
   gptel-model   "test"
   gptel-backend (gptel-make-openai "desktop"
                   :stream t
                   :protocol "http"
                   :host "10.0.0.8:8080"
                   :models '("test"))))

If I use the local machine with the defaults that are here commented out, it works. If I try to use the desktop-AI, it yields the following error:

desktop response error: ((c4bb9327bb265bd639a950ab5ffe93f8 . 0)) Could not parse HTTP response.

The following shell-script, which copies a file and feeds it to the desktop AI works though:

#!/bin/bash
scp $1 [email protected]:/home/alex/wizard
ssh [email protected] "sh ~/.local/bin/wizardcoder-python-34b-v1.0.Q5_K_M.llamafile /home/alex/wizard/$1"

What's the problem?

Mar 06 '24 17:03 nameiwillforget

I'm assuming you're using the server llamafile in your desktop and not the other one.

Try looking at the request log:

Run (setq gptel-log-level 'debug)
Try to use the desktop llamafile and produce the error
Look at the *gptel-log* buffer. The curl command the HTTP response should be present. You can paste that here.

Mar 06 '24 18:03 karthink

Here is the log:

{
  "gptel": "request headers",
  "timestamp": "2024-03-08 00:42:48"
}
{
  "Content-Type": "application/json"
}
{
  "gptel": "request body",
  "timestamp": "2024-03-08 00:42:48"
}
{
  "model": "test",
  "messages": [
    {
      "role": "system",
      "content": "You are a large language model living in Emacs and a helpful assistant. Respond concisely."
    },
    {
      "role": "user",
      "content": "Can you hear me?"
    }
  ],
  "stream": false,
  "temperature": 1.0
}
{
  "gptel": "request Curl command",
  "timestamp": "2024-03-08 00:42:48"
}
[
  "curl",
  "--disable",
  "--location",
  "--silent",
  "--compressed",
  "-XPOST",
  "-y300",
  "-Y1",
  "-D-",
  "-w(75aecd991c05b7de7a6e566cc05016ad . %{size_header})",
  "-d{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"steam\":false,\"temperature\":1.0}",
  "-HContent-Type: application/json",
  "http://localhost:8080/v1/chat/completions"

Mar 07 '24 22:03 nameiwillforget

@nameiwillforget this looks incomplete, did you grab everything in the log buffer?

Mar 08 '24 01:03 karthink

Yes, but there was another gptel-buffer, gptel-curl:

HTTP/1.1 200 OK
Access-Control-Allow-Origin: 
Content-Type: text/event-stream
Keep-Alive: timeout=5, max=5
Server: llama.cpp
Transfer-Encoding: chunked

(d062d386c408445be36c4ba19bd78419 . 160)

Mar 08 '24 15:03 nameiwillforget

[
  "curl",
  "--disable",
  "--location",
  "--silent",
  "--compressed",
  "-XPOST",
  "-y300",
  "-Y1",
  "-D-",
  "-w(75aecd991c05b7de7a6e566cc05016ad . %{size_header})",
  "-d{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"steam\":false,\"temperature\":1.0}",
  "-HContent-Type: application/json",
  "http://localhost:8080/v1/chat/completions"

I meant in the gptel-log buffer. It looks like the above log is incomplete. Could you try again?

Mar 08 '24 17:03 karthink

I just tried, but now it simply worked. I don't know what changed, I tried several different times before. I changed the how llamafile-files are executed by default using mimeo. Could that have something to do with it? Though the llm was running before I changed that, I think, so I'm not sure how it would. Anyway, I'll look and try to find out what changed.

Mar 08 '24 18:03 nameiwillforget

So it seems like it's only the contact between the laptop and the desktop that doesn't work, if I do exactly same thing from the desktop itself, it works. I tried again and I think the resulting log is the same, but here it is nevertheless:

{
  "gptel": "request headers",
  "timestamp": "2024-03-11 21:48:18"
}
{
  "Content-Type": "application/json"
}
{
  "gptel": "request body",
  "timestamp": "2024-03-11 21:48:18"
}
{
  "model": "test",
  "messages": [
    {
      "role": "system",
      "content": "You are a large language model living in Emacs and a helpful assistant. Respond concisely."
    },
    {
      "role": "user",
      "content": "Can you hear me?"
    }
  ],
  "stream": false,
  "temperature": 1.0
}
{
  "gptel": "request Curl command",
  "timestamp": "2024-03-11 21:48:18"
}
[
  "curl",
  "--disable",
  "--location",
  "--silent",
  "--compressed",
  "-XPOST",
  "-y300",
  "-Y1",
  "-D-",
  "-w(8866ba70f2e8a5a85ab4dc25c869e5a1 . %{size_header})",
  "-d{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"stream\":false,\"temperature\":1.0}",
  "-HContent-Type: application/json",
  "http://10.0.0.8:8080/v1/chat/completions"
]

Mar 11 '24 20:03 nameiwillforget

What happens if you run that curl command manually?


curl --disable --location --silent --compressed -XPOST -y300 -Y1 -D- \
     -w'(8866ba70f2e8a5a85ab4dc25c869e5a1 . %{size_header})' \
     -d"{\"model\":\"test\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a large language model living in Emacs and a helpful assistant. Respond concisely.\"},{\"role\":\"user\",\"content\":\"Can you hear me?\"}],\"stream\":false,\"temperature\":1.0}" -H"Content-Type: application/json" \
     'http://10.0.0.8:8080/v1/chat/completions'

Mar 12 '24 00:03 karthink

I get the following output:

(8866ba70f2e8a5a85ab4dc25c869e5a1 . 0)

I successfully contacted the model from the desktop immediately before that.

Mar 12 '24 10:03 nameiwillforget

I get the following output:

This is a networking/connection issue, unrelated to gptel. I suggest checking if you can ping your desktop/laptop from the other device first.

Apr 02 '24 23:04 karthink

@nameiwillforget Did you get gptel working as intended?

Oct 30 '24 03:10 karthink

No. I couldn't isolate the error, so I put it off. Now I'm using ChatGPT, which works fine. I'll close the issue.

Nov 04 '24 22:11 nameiwillforget

gptel gptel copied to clipboard

Contact to llamafile-AI on server fails

gptel
gptel copied to clipboard