llm
llm copied to clipboard
Reconsider llm.Conversation in favor of allowing prompts to be in reply to responses
I was originally planning on implementing tools (#898) as part of llm.Conversation but I've been having second thoughts about that.
There's a lot to be said for allowing prompts to reply to other responses, and having those form a chain.
Two big advantages:
- The in-memory (no SQLite database at all) story for the Python client library is simplified. Users don't have to remember to instantiate a
llm.Conversationif they are going to send prompts that follow up on other prompts - particularly useful for implementing tools. - Means I can support branching conversations. Maybe you start a conversation with a huge document (which gets into the prompt cache) and then ask multiple questions of that document as separate branches of replies to that initial prompt. Also good for a UI that lets power users really take control of the context in their ongoing conversations.
I'm trying to figure out if this is a blocker for tools or not.
If I were to do this here's one potential design:
Responseobjects gain a.reply(prompt)method, which can be used to reply to that response with a fresh prompt and get back a fresh response.- The
responsesdatabase table gains a new nullablereply_to_idcolumn which can be used to build a tree of responses. SQLite has had recursive CTEs for ages to SQL for working with these won't be too hard. - I could keep the concept of a conversation around to avoid breaking the existing CLI and databases, but a conversation could actually be a tree rather than a linear chain.
- Thankfully
llm -ris not yet taken, so I can usellm -r/--replyto reply to the most recent message (similar tollm -c/--continueat the moment), and havellm --reply-to IDas the way of replying to a specific response ID.
I'm going to do a research spike on this in a branch.
I think this is the migration:
@migration
def m018_replies(db):
db["responses"].add_column("reply_to_id", str)
db["responses"].add_foreign_key("reply_to_id", "responses", "id")
db["responses"].transform(
column_order=(
"id",
"reply_to_id",
"model",
"prompt",
"system",
"prompt_id",
"system_id",
"schema_id",
"prompt_json",
"options_json",
"response",
"response_json",
"conversation_id",
"first_token_ms",
"duration_ms",
"datetime_utc",
"input_tokens",
"output_tokens",
"token_details",
),
)
I guess this means the Prompt() class constructor needs to be able to take a reply_to_id= argument, or a reply_to=Response() argument, or both?
Would be neater to just to reply_to=response (which actually usually gets populated from calling response.reply(...)) and have the CLI DB handling code deal with IDs if necessary.
Also interesting: currently the .execute() method (including in all the plugins) has this signature: https://github.com/simonw/llm/blob/e78e1fceb273aeed467d80ec6c1c710a1433d3c1/llm/default_plugins/openai_models.py#L573-L578
A lot of those then have methods like this one:
https://github.com/simonw/llm/blob/e78e1fceb273aeed467d80ec6c1c710a1433d3c1/llm/default_plugins/openai_models.py#L468-L480
Note how conversation is a thing passed directly to .execute() which is used to build up that previous messages= array.
In the new reply-to world that won't be necessary. Just being passed the prompt will be enough, since the code can then follow the prompt.reply_to.prompt.reply_to chain all the way to the top in order to rebuild the messages.
This is great news for implementing tools, because it helps solve the thorny problem of keeping the ID from tool call requests so it can be matched up with the IDs in the tool call replies.
This change could be a breaking change for existing plugins. That's worth thinking about - it may be possible to keep them working by detecting if their execute() method takes a conversation parameter and behaving differently for plugin models that don't do that.
The .execute() signature is a bit of a mess already, perhaps I rename that method to some new name to allow for a fresh design entirely?
Current docs: https://github.com/simonw/llm/blob/e78e1fceb273aeed467d80ec6c1c710a1433d3c1/docs/plugins/tutorial-model-plugin.md#L224
Understanding execute()
The full signature of the
execute()method is:def execute(self, prompt, stream, response, conversation):The
promptargument is aPromptobject that contains the text that the user provided, the system prompt and the provided options.
streamis a boolean that says if the model is being run in streaming mode.
responseis theResponseobject that is being created by the model. This is provided so you can write additional information toresponse.response_json, which may be logged to the database.
conversationis theConversationthat the prompt is a part of - orNoneif no conversation was provided. Some models may useconversation.responsesto access previous prompts and responses in the conversation and use them to construct a call to the LLM that includes previous context.
I'm already reconsidering what .execute() does a bit for annotations in:
- #716
Now that I've built this:
- https://github.com/simonw/llm-fragments-github/issues/3
I can try this:
llm -f github:simonw/llm \
-f issue:simonw/llm/938 \
-m gemini-2.5-pro-exp-03-25 \
--system 'muse on this issue, then propose a whole bunch of code to help implement it'
Gemini 2.5 Pro came up with a whole bunch of suggestions, and charged me 66.36 cents.
https://gist.github.com/simonw/a5f0c1e8184f4ddc8b71b30890fe690c