thread_id too long for Postgres checkpoint
Checked other resources
- [x] This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- [x] I added a clear and detailed title that summarizes the issue.
- [x] I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- [x] I included a self-contained, minimal example that demonstrates the issue INCLUDING all the relevant imports. The code run AS IS to reproduce the issue.
Example Code
from langgraph.checkpoint.postgres import PostgresSaver
DB_URI = "postgresql://postgres:postgres@localhost:5442/postgres?sslmode=disable"
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
builder = StateGraph(...)
graph = builder.compile(checkpointer=checkpointer)
too_long_thread_id = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
graph.invoke(
{"messages": [{"role": "user", "content": "hi! i am Bob"}]},
{"configurable": {"thread_id": too_long_thread_id }},
)
Error Message and Stack Trace (if applicable)
Postgres error:
there is no unique or exclusion constraint matching the ON CONFLICT specification
Description
When a thread id is absurdly long, Posgres fail to manage the thread_id index. When this happens, all calls (even when the thread_id is short) can fail.
System Info
python -m langchain_core.sys_info
Solution
Constrain the size of the thread_id ?
Would implementing a hash function be suitable as shortening the size might result in collision in same thread id?
It could fix the issue, but then you won't have the same thread id in PostgreSQL and complicate the ease of checkpointing usage and debugging
@Freezaa9 we can use a uuid that will always create the same value based on phrase. Can you explain how pgsql uses this threadid or generates it?
Update regarding the error returned by langgraph:
ProgramLimitExceeded: Index row size 4664 exceeds btree version 4 maximum 2704 for index "checkpoint_blobs_pkey" ...
_exit_ (/usr/local/lib/python3.12/site-packages/psycopgl/pipeline.py:265)
So psycopgl do return the correct error.
Would it be a good practice for langgraph to return his own error specifying that thread_id should not be that long ? Or it should solely depend on the database use for checkpointing ?
We should add validation in LangGraph to prevent this issue. I'm thinking we could add a warning when thread_id or checkpoint_ns is too long (>500 characters?), so users can catch this before hitting the actual PostgreSQL error.
Additionally, we should update the documentation to recommend using UUID or hash for identifiers:
import uuid
thread_id = str(uuid.uuid4()) # Recommended approach
@SunHuawei Yes I think this is the way to go. The actual posgres limit is 2704.
So I guess we can wait for the Langgraph team to validate this approach
I'd prefer if we updated docs to just recommend using uuids for conversation ids. That's what we expect people to do. Docs sometimes don't use it to just reduce the size of the code snippet (i.e., leave out the uuid generation part).
We should add validation in LangGraph to prevent this issue. I'm thinking we could add a warning when thread_id or checkpoint_ns is too long (>500 characters?),
I'd be on board with doing this if folks want to open a PR
I'm a bit busy at the moment but I'll try to take time for a PR soon. thanks @eyurtsev