Support for OpenAI model response
This reminds me of: https://xkcd.com/927
Jokes aside it's adding two interesting things:
- Allowing to refer to the previous responses: gptel already handles part of this, where it constructs the history by parsing text properties. Could this be extended to additionally store the response_id and use that instead of the response itself?
- Built-in/hosted tools: I am not sure about the computer use, but the other two (web-search and file-search) could be framed as special tools? if web-search/file-search is selected as a tool, an additional transients are added to allow the user to modify settings that are specific to them. I've been playing around with some of this as part of the openai-assistant backend (which is being deprecated now) to query and display/update options when necessary, which could be used here as well.
There are also some similarities to features in other backends (Antrhopic citations, Gemini grounding) but I am not familiar enough about how they work to comment on them.
Details from:
- https://platform.openai.com/docs/api-reference/responses
- https://x.com/OpenAIDevs/status/1899531225468969240
The moment I read this announcement yesterday I knew it was going to be a bit of a mess. The responses API is essentially taking on more of the responsibilities of the LLM client to make the developer's job easier, but this means the client has to support both server-side and local state, which is messier.
You also have two sources of truth for the state now. Storing the resp_id is possible but the user's impression is that the buffer contents represent the state of the conversation, so if they edit the buffer which state do you pick?
I generally target the common denominator of the available APIs to keep the maintenance load under control. I'm content to wait and see if the responses API enables anything different from the conversation API. The special tools are available under different model names in the chat completions API.
Supporting the web search tool should be easy, but the file search tool is going to be a lot of work as it requires a whole subsystem for sending and tracking uploads.
You also have two sources of truth for the state now. Storing the resp_id is possible but the user's impression is that the buffer contents represent the state of the conversation, so if they edit the buffer which state do you pick?
Thats fair. Also, the response API doesn't seem to have the list of messages option as part of a request anymore. It's just one input string now.
Well, I started playing around to see if I can get this to work. Similar to the openai-assistant, the structure of the request and responses are different here also. Like I mentioned above, in the request, the API only takes a string as an input. In the response, it's no longer choices[0].message.content its output[n].content.text with multiple outputs per response. The tool structure itself seem to be the same in the request. But the response structure is also different. Also, not clear how to provide the tools output with the response API.
Thats fair. Also, the response API doesn't seem to have the list of messages option as part of a request anymore. It's just one input string now.
This is actually how I imagined the chat completions API would work before it was released in 2023 and I was thinking of writing a package for LLM interaction. I found the idea of sending the entire conversation every turn quite strange and wasteful. But gptel's buffer-based paradigm is designed around the strengths of the chat completions API, so here we are.
Well, I started playing around to see if I can get this to work.
Thanks for looking into it. Let me know if you get somewhere interesting with it!
OpenAI now requires using the Response API for o1-pro and o3-pro. Newer OpenAI models may do the same.
I just ran into this issue, too. By default, gptel uses the v1/chat/completions endpoint for openai, but when trying to use this endpoint for o3-pro, the OpenAI API returns:
ChatGPT error: ((HTTP/2 404) invalid_request_error) This is not a chat model and thus not supported in the v1/chat/completions endpoint. Did you mean to use v1/completions?
So I then defined the following two model providers in my dot emacs:
(gptel-make-openai "ChatGPT"
:stream t
:key gptel-api-key)
(gptel-make-openai "o3-pro"
:stream t
:key gptel-api-key
:endpoint "/v1/completions"
:models '(o3-pro))
In gptel-mode I can select ChatGPT buffer to use the regular v1/chat/completions endpoint, and o3-pro model from gptel-menu to use the completions endpoint. However, when I send a query to o3-pro, I get the following:
Querying o3-pro...
o3-pro error: ((HTTP/2 400) invalid_request_error) Missing required parameter: 'prompt'.
I then realized that gptel uses the following json body for all openai-style requests to v1/chat/completions:
{
"model": "..",
"messages": []
}
But v1/completions apparently expects "prompt": [].
I cooked up an implementation for the openai-response API:
implementation
(cl-defstruct (gptel-openai-responses
(:constructor gptel-openai--make-responses)
(:copier nil)
(:include gptel-openai)))
(cl-defun gptel-make-openai-responses
(name &key curl-args models stream key request-params
(header
(lambda () (when-let* ((key (gptel--get-api-key)))
`(("Authorization" . ,(concat "Bearer " key))))))
(host "api.openai.com")
(protocol "https")
(endpoint "/v1/responses"))
"Create a openai responses backend."
(declare (indent 1))
(let ((backend (gptel-openai--make-responses
:curl-args curl-args
:name name
:host host
:header header
:key key
:models (gptel--process-models models)
:protocol protocol
:endpoint endpoint
:stream stream
:request-params request-params
:url (if protocol
(concat protocol "://" host endpoint)
(concat host endpoint)))))
(prog1 backend
(setf (alist-get name gptel--known-backends
nil nil #'equal)
backend))))
(defun gptel-openai-response--process-output (output-item info)
;; both streams output_item.done.item and the output-item in response are the same.
(pcase (plist-get output-item :type)
("function_call"
(gptel--inject-prompt ; First add the tool call to the prompts list
(plist-get info :backend)
(plist-get info :data)
(copy-sequence output-item)) ;; copying to avoid following changes effect output-item
(ignore-errors (plist-put output-item :args
(gptel--json-read-string
(plist-get output-item :arguments))))
(plist-put output-item :arguments nil)
(plist-put info :tool-use
(append
(plist-get info :tool-use)
(list output-item))))
("reasoning"
(gptel--inject-prompt ; responses expects the reasoning blocks
(plist-get info :backend) (plist-get info :data) output-item)
(plist-put info :reasoning
(append
(plist-get info :reasoning)
(list (map-nested-elt output-item '(:summary :text))))))
(x `("unhandled" ,x))))
(cl-defmethod gptel--request-data ((backend gptel-openai-responses) prompts)
"JSON encode PROMPTS for sending to ChatGPT."
(let* ((prompts (cl-call-next-method))
(p prompts))
(while p
(when (eq (car p) :messages)
(setcar p :input))
(setq p (cddr p)))
prompts))
(cl-defmethod gptel--inject-prompt
((backend gptel-openai-responses) data new-prompt &optional _position)
"JSON encode PROMPTS for sending to ChatGPT."
(when (keywordp (car-safe new-prompt)) ;Is new-prompt one or many?
(setq new-prompt (list new-prompt)))
(let ((prompts (plist-get data :input)))
(plist-put data :input (vconcat prompts new-prompt))))
(cl-defmethod gptel-curl--parse-stream ((_backend gptel-openai-responses) info)
"Parse an OpenAI API data stream.
Return the text response accumulated since the last call to this
function. Additionally, mutate state INFO to add tool-use
information if the stream contains it."
(let* ((content-strs))
(condition-case err
(while (re-search-forward "^data:" nil t)
(save-match-data
(let ((json-response (save-excursion
(gptel--json-read))))
(pcase (plist-get json-response :type)
;; ("response.completed"
;; ;; Once stream end processing
;; )
("response.output_text.delta"
(push (plist-get json-response :delta) content-strs))
("response.output_item.done"
(let ((output-item (plist-get json-response :item)))
(gptel-openai-response--process-output output-item info)))))))
(error (goto-char (match-beginning 0))))
(apply #'concat (nreverse content-strs))))
(cl-defmethod gptel--parse-response ((_backend gptel-openai-responses) response info)
"Parse an OpenAI (non-streaming) RESPONSE and return response text.
Mutate state INFO with response metadata."
(plist-put info :stop-reason
(list (plist-get response :status)
(plist-get response :incomplete_details)))
(plist-put info :output-tokens
(map-nested-elt response '(:usage :total_tokens)))
(cl-loop for output-item across (plist-get response :output)
if (equal (plist-get output-item :type) "message")
collect (map-nested-elt output-item '(:content 0 :text)) into return-val
else
do (gptel-openai-response--process-output output-item info)
finally return (funcall #'string-join return-val)))
(cl-defmethod gptel--parse-tool-results ((_backend gptel-openai-responses) tool-use)
"Return a prompt containing tool call results in TOOL-USE."
(mapcar
(lambda (tool-call)
(list
:type "function_call_output"
:call_id (plist-get tool-call :call_id)
:output (plist-get tool-call :result)))
tool-use))
Initially I was worried that, similar to openai-assistant (https://github.com/karthink/gptel/discussions/539), I would have to make minor adjustments with most of the code being duplicated from the openai implementation. Turns out they are quite different, and also a lot easier to work with.
@karthink is this something worth merging into gptel, or should this also live outside? The above is the implementation to support all gptel's current features. I'll be adding built-in tools step by step for the response API, but that can live outside of gptel if it doesn't generalize well.
Updated implementation to handle annotions and use web_search_preview and file_search:
(cl-defstruct (gptel-openai-responses
(:constructor gptel-openai--make-responses)
(:copier nil)
(:include gptel-openai)))
(cl-defun gptel-make-openai-responses
(name &key curl-args models stream key request-params
(header
(lambda () (when-let* ((key (gptel--get-api-key)))
`(("Authorization" . ,(concat "Bearer " key))))))
(host "api.openai.com")
(protocol "https")
(endpoint "/v1/responses"))
"Create a openai responses backend."
(declare (indent 1))
(let ((backend (gptel-openai--make-responses
:curl-args curl-args
:name name
:host host
:header header
:key key
:models (gptel--process-models models)
:protocol protocol
:endpoint endpoint
:stream stream
:request-params request-params
:url (if protocol
(concat protocol "://" host endpoint)
(concat host endpoint)))))
(prog1 backend
(setf (alist-get name gptel--known-backends
nil nil #'equal)
backend))))
(defun gptel-openai-response--process-output (output-item info)
;; both streams output_item.done.item and the output-item in response are the same.
(pcase (plist-get output-item :type)
("function_call"
(gptel--inject-prompt ; First add the tool call to the prompts list
(plist-get info :backend)
(plist-get info :data)
(copy-sequence output-item)) ;; copying to avoid following changes effect output-item
(ignore-errors (plist-put output-item :args
(gptel--json-read-string
(plist-get output-item :arguments))))
(plist-put output-item :arguments nil)
(plist-put info :tool-use
(append
(plist-get info :tool-use)
(list output-item))))
("reasoning"
(gptel--inject-prompt ; responses expects the reasoning blocks
(plist-get info :backend) (plist-get info :data) output-item)
(plist-put info :reasoning
(append
(plist-get info :reasoning)
(list (map-nested-elt output-item '(:summary :text))))))
(_ ;; TODO handle others
)))
(defun gptel-openai-response--process-annotation (annotation info)
"Returns string that can be added to content or nil."
(pcase (plist-get annotation :type)
("file_citation"
(format "[file_citation:%s]" (plist-get annotation :file_id)))
(_ ;; TODO handle otheres
)))
(cl-defmethod gptel--request-data ((backend gptel-openai-responses) prompts)
"JSON encode PROMPTS for sending to ChatGPT."
(let* ((prompts (cl-call-next-method))
(p prompts))
;; Adding built-in tools
(when gptel-openai-responses--tools
(plist-put prompts :tools (vconcat (plist-get prompts :tools)
(mapcar (lambda (built-in-tool)
(if (functionp built-in-tool)
(funcall built-in-tool)
built-in-tool))
gptel-openai-responses--tools))))
(while p
(when (eq (car p) :messages)
(setcar p :input))
(setq p (cddr p)))
prompts))
(cl-defmethod gptel--inject-prompt
((backend gptel-openai-responses) data new-prompt &optional _position)
"JSON encode PROMPTS for sending to ChatGPT."
(when (keywordp (car-safe new-prompt)) ;Is new-prompt one or many?
(setq new-prompt (list new-prompt)))
(let ((prompts (plist-get data :input)))
(plist-put data :input (vconcat prompts new-prompt))))
(cl-defmethod gptel-curl--parse-stream ((_backend gptel-openai-responses) info)
"Parse an OpenAI API data stream.
Return the text response accumulated since the last call to this
function. Additionally, mutate state INFO to add tool-use
information if the stream contains it."
(let* ((content-strs))
(condition-case err
(while (re-search-forward "^data:" nil t)
(save-match-data
(let ((json-response (save-excursion
(gptel--json-read))))
(pcase (plist-get json-response :type)
;; ("response.completed"
;; ;; Once stream end processing
;; )
("response.output_text.delta"
(push (plist-get json-response :delta) content-strs))
("response.output_text.annotation.added"
(let ((annotation (plist-get json-response :annotation)))
(push (gptel-openai-response--process-annotation annotation info) content-strs)))
("response.output_item.done"
(let ((output-item (plist-get json-response :item)))
(gptel-openai-response--process-output output-item info)))))))
(error (goto-char (match-beginning 0))))
(apply #'concat (nreverse content-strs))))
(cl-defmethod gptel--parse-response ((_backend gptel-openai-responses) response info)
"Parse an OpenAI (non-streaming) RESPONSE and return response text.
Mutate state INFO with response metadata."
(plist-put info :stop-reason
(list (plist-get response :status)
(plist-get response :incomplete_details)))
(plist-put info :output-tokens
(map-nested-elt response '(:usage :total_tokens)))
(cl-loop for output-item across (plist-get response :output)
if (equal (plist-get output-item :type) "message")
collect
(string-join
(list (map-nested-elt output-item '(:content 0 :text))
(string-join
(mapcar (lambda (annotation)
(gptel-openai-response--process-annotation annotation info))
(map-nested-elt output-item '(:content 0 :annotations)))
"\n"))
"\n")
into return-val
else
do (gptel-openai-response--process-output output-item info)
finally return (funcall #'string-join return-val)))
(cl-defmethod gptel--parse-tool-results ((_backend gptel-openai-responses) tool-use)
"Return a prompt containing tool call results in TOOL-USE."
(mapcar
(lambda (tool-call)
(list
:type "function_call_output"
:call_id (plist-get tool-call :call_id)
:output (plist-get tool-call :result)))
tool-use))
I could use the gptel-tools, but that requires a much larger refactor. So here I am using another variable gptel-openai-responses--known-tools to store the built-in tools:
- To use
web_search_preview:(setq gptel-openai-responses--tools (list '(:type "web_search_preview"))) - To use
file_search:;; Get victory store id from the openai dashboard (setq gptel-openai-responses--tools (list '(:type "file_search" :vector_store_ids ["vs_XXXXXXXXXXXXX"])))
For anyone interested using the above, here's a simple transient - this adds another column to the gptel-menu:
(defclass my/add-to-list-switch (transient-variable)
((target-value :initarg :target-value)
(target-list :initarg :target-list)
(format :initarg :format :initform " %k %d")
))
(cl-defmethod transient-infix-read ((obj my/add-to-list-switch))
;;Do nothing
)
(cl-defmethod transient-infix-set ((obj my/add-to-list-switch) _)
(if (member (oref obj target-value) (symbol-value (oref obj target-list)))
(set (oref obj target-list)
(delete (oref obj target-value) (symbol-value (oref obj target-list))))
(set (oref obj target-list)
(append (symbol-value (oref obj target-list))
(list (oref obj target-value)))))
(transient-setup))
(cl-defmethod transient-format-description ((obj my/add-to-list-switch))
(propertize (transient--get-description obj) 'face
(if (member (oref obj target-value) (symbol-value (oref obj target-list)))
'transient-value
'transient-inactive-value)))
(defvar gptel-openai-responses--tools nil)
(transient-define-prefix gptel-openai-response-built-in-tools ()
[["Built in tools"
("wl" "web search (low context)" ""
:class my/add-to-list-switch
:target-value (:type "web_search_preview" :search-context-size "low")
:target-list gptel-openai-responses--tools)
("wm" "web search (medium context)" ""
:class my/add-to-list-switch
:target-value (:type "web_search_preview" :search-context-size "medium")
:target-list gptel-openai-responses--tools)
("wh" "web search (high context)" ""
:class my/add-to-list-switch
:target-value (:type "web_search_preview" :search-context-size "high")
:target-list gptel-openai-responses--tools)
("fo" "File search (org)" ""
:class my/add-to-list-switch
:target-value (:type "file_search"
:vector_store_ids ["vs_XXXXX"] ;; vector stroe id from dasboard
:target-list gptel-openai-responses--tools)
""
("DEL" "Remove all" (lambda ()
(interactive)
(setq gptel-openai-responses--tools nil)
(transient-setup))
:transient t
:if (lambda () gptel-openai-responses--tools))
("RET" "Done" transient-quit-one)
]])
(transient-append-suffix 'gptel-menu '(0 -1)
[:if (lambda () (gptel-openai-responses-p gptel-backend))
""
(:info
(lambda ()
(concat
"Built-in tools"
(and gptel-openai-responses--tools
(concat " (" (propertize (format "%d"
(length gptel-openai-responses--tools))
'face 'warning)
")"))))
:format "%d" :face transient-heading)
(gptel-openai-response-built-in-tools :key "T" :description "Select")])