Summarizer ignores keep_last_n_messages
Describe the bug. Once context window gets full the summarization of previous messages is triggered. As far as I can tell the planned behavior is to keep the last user messages of the block of messages to be summarized to keep context, e.g. if a user asks a question the agent should still know what question to answer after summarizing.
Now we had the problem that exactly this was not the case and the agent wasn't able to refer to this very last user message/question. After debugging this behavior it seems like the agent keeps exactly two messages created before the system_alert:
-
The System instructions, which doesn't make sense imho since we have it twice in our composed system prompt then and it already makes up for a big part of the context window. I can see here that this is expected behavior but I'm not sure why.
-
The last user message which is (in our case) often a heartbeat message when the system_alert is triggered after a tool call.
I tried fixing it by setting env variable letta_summarizer_keep_last_n_messages=3 (also tried higher values) but it's always summarizing it down to only these 2 messages (see logs). Am I misunderstanding anything or do I need to set another env variable to get expected behavior?
2025-03-21 12:12:46 Letta.letta.agent - INFO - System message token count=1280
2025-03-21 12:12:46 Letta.letta.agent - INFO - token_counts_no_system=[97, 87, 65, 62, 98, 86, 62, 100, 88, 62, 100, 85, 66, 115, 86, 81, 106, 85, 62, 100, 84, 67, 121, 87, 74, 435, 87, 68, 1049, 88, 63, 1051, 88, 66, 1050, 86]
2025-03-21 12:12:46 Letta.letta.agent - INFO - desired_token_count_to_summarize=5275
2025-03-21 12:12:46 Letta.letta.agent - WARNING - Breaking summary cutoff early on role=MessageRole.tool because we hit the `keep_last_n_messages`=3
2025-03-21 12:12:46 Letta.letta.agent - INFO - Evicting 33/37 messages...
2025-03-21 12:12:46 Letta.letta.agent - INFO - Attempting to summarize 33 messages of 37
2025-03-21 12:12:48 httpx - INFO - HTTP Request: POST https://haimdall.dev.ella-lab.io/v1/chat/completions "HTTP/1.1 200 OK"
2025-03-21 12:12:48 Letta.letta.agent - INFO - Got summary: I started by completing my bootup sequence and logged the user's first login. The user greeted me multiple times, and I responded with friendly messages to maintain a casual tone. They shared their favorite color, which I noted for personalization. Later, the user checked in on me, requested a lengthy message for testing, and asked for repetitions of that message several times. I engaged with their requests and kept the conversation flowing.
2025-03-21 12:12:48 Letta.letta.agent - INFO - Packaged into message: {
2025-03-21 12:12:48 "type": "system_alert",
2025-03-21 12:12:48 "message": "Note: prior messages (34 of 38 total messages) have been hidden from view due to conversation memory constraints.\nThe following is a summary of the previous 33 messages:\n I started by completing my bootup sequence and logged the user's first login. The user greeted me multiple times, and I responded with friendly messages to maintain a casual tone. They shared their favorite color, which I noted for personalization. Later, the user checked in on me, requested a lengthy message for testing, and asked for repetitions of that message several times. I engaged with their requests and kept the conversation flowing.",
2025-03-21 12:12:48 "time": "2025-03-21 11:12:48 AM UTC+0000"
2025-03-21 12:12:48 }
2025-03-21 12:12:49 Letta.letta.agent - INFO - Ran summarizer, messages length 37 -> 2
2025-03-21 12:12:49 Letta.letta.agent - INFO - Summarizer brought down total token count from 7537 -> 1463
Please describe your setup
- [X] How did you install letta?
- Docker deployment
- [X] Describe your setup
- GCP
Describe the solution you'd like A clear and concise description of what you want to happen.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
It seems like we're only keeping the system message and prepending the summary message. Shouldn't there also be a part where either in_context_messages[cutoff:] are prepended or all messages except of system_message (see comment above) plus in_context_messages[cutoff:] are trimmed in trim_all_in_context_messages_except_system?
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.