llm Reconsider llm.Conversation in favor of allowing prompts to be in reply to responses

I was originally planning on implementing tools (#898) as part of llm.Conversation but I've been having second thoughts about that.

There's a lot to be said for allowing prompts to reply to other responses, and having those form a chain.

Two big advantages:

The in-memory (no SQLite database at all) story for the Python client library is simplified. Users don't have to remember to instantiate a llm.Conversation if they are going to send prompts that follow up on other prompts - particularly useful for implementing tools.
Means I can support branching conversations. Maybe you start a conversation with a huge document (which gets into the prompt cache) and then ask multiple questions of that document as separate branches of replies to that initial prompt. Also good for a UI that lets power users really take control of the context in their ongoing conversations.

Apr 20 '25 04:04 simonw

I'm trying to figure out if this is a blocker for tools or not.

Apr 20 '25 04:04 simonw

If I were to do this here's one potential design:

Response objects gain a .reply(prompt) method, which can be used to reply to that response with a fresh prompt and get back a fresh response.
The responses database table gains a new nullable reply_to_id column which can be used to build a tree of responses. SQLite has had recursive CTEs for ages to SQL for working with these won't be too hard.
I could keep the concept of a conversation around to avoid breaking the existing CLI and databases, but a conversation could actually be a tree rather than a linear chain.
Thankfully llm -r is not yet taken, so I can use llm -r/--reply to reply to the most recent message (similar to llm -c/--continue at the moment), and have llm --reply-to ID as the way of replying to a specific response ID.

Apr 20 '25 04:04 simonw

I'm going to do a research spike on this in a branch.

Apr 20 '25 04:04 simonw

I think this is the migration:

@migration
def m018_replies(db):
    db["responses"].add_column("reply_to_id", str)
    db["responses"].add_foreign_key("reply_to_id", "responses", "id")
    db["responses"].transform(
        column_order=(
            "id",
            "reply_to_id",
            "model",
            "prompt",
            "system",
            "prompt_id",
            "system_id",
            "schema_id",
            "prompt_json",
            "options_json",
            "response",
            "response_json",
            "conversation_id",
            "first_token_ms",
            "duration_ms",
            "datetime_utc",
            "input_tokens",
            "output_tokens",
            "token_details",
        ),
    )

Apr 20 '25 04:04 simonw

I guess this means the Prompt() class constructor needs to be able to take a reply_to_id= argument, or a reply_to=Response() argument, or both?

Would be neater to just to reply_to=response (which actually usually gets populated from calling response.reply(...)) and have the CLI DB handling code deal with IDs if necessary.

Apr 20 '25 04:04 simonw

Also interesting: currently the .execute() method (including in all the plugins) has this signature: https://github.com/simonw/llm/blob/e78e1fceb273aeed467d80ec6c1c710a1433d3c1/llm/default_plugins/openai_models.py#L573-L578

A lot of those then have methods like this one:

https://github.com/simonw/llm/blob/e78e1fceb273aeed467d80ec6c1c710a1433d3c1/llm/default_plugins/openai_models.py#L468-L480

Note how conversation is a thing passed directly to .execute() which is used to build up that previous messages= array.

In the new reply-to world that won't be necessary. Just being passed the prompt will be enough, since the code can then follow the prompt.reply_to.prompt.reply_to chain all the way to the top in order to rebuild the messages.

This is great news for implementing tools, because it helps solve the thorny problem of keeping the ID from tool call requests so it can be matched up with the IDs in the tool call replies.

Apr 20 '25 04:04 simonw

This change could be a breaking change for existing plugins. That's worth thinking about - it may be possible to keep them working by detecting if their execute() method takes a conversation parameter and behaving differently for plugin models that don't do that.

Apr 20 '25 04:04 simonw

The .execute() signature is a bit of a mess already, perhaps I rename that method to some new name to allow for a fresh design entirely?

Current docs: https://github.com/simonw/llm/blob/e78e1fceb273aeed467d80ec6c1c710a1433d3c1/docs/plugins/tutorial-model-plugin.md#L224

Understanding execute()

The full signature of the execute() method is:
def execute(self, prompt, stream, response, conversation):
The prompt argument is a Prompt object that contains the text that the user provided, the system prompt and the provided options.

stream is a boolean that says if the model is being run in streaming mode.

response is the Response object that is being created by the model. This is provided so you can write additional information to response.response_json, which may be logged to the database.

conversation is the Conversation that the prompt is a part of - or None if no conversation was provided. Some models may use conversation.responses to access previous prompts and responses in the conversation and use them to construct a call to the LLM that includes previous context.

Apr 20 '25 04:04 simonw

I'm already reconsidering what .execute() does a bit for annotations in:

#716

Apr 20 '25 04:04 simonw

Now that I've built this:

https://github.com/simonw/llm-fragments-github/issues/3

I can try this:

llm -f github:simonw/llm \
  -f issue:simonw/llm/938 \
  -m gemini-2.5-pro-exp-03-25 \
  --system 'muse on this issue, then propose a whole bunch of code to help implement it'

Gemini 2.5 Pro came up with a whole bunch of suggestions, and charged me 66.36 cents.

https://gist.github.com/simonw/a5f0c1e8184f4ddc8b71b30890fe690c

Apr 20 '25 06:04 simonw

llm llm copied to clipboard

Reconsider llm.Conversation in favor of allowing prompts to be in reply to responses

Understanding execute()

llm
llm copied to clipboard