langgraph-checkpoint-postgres (psycopg.OperationalError: sending query and params failed: SSL error: bad length) encountered across multiple version
Checked other resources
- [x] This is a bug, not a usage question. For questions, please use GitHub Discussions.
- [x] I added a clear and detailed title that summarizes the issue.
- [x] I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- [x] I included a self-contained, minimal example that demonstrates the issue INCLUDING all the relevant imports. The code run AS IS to reproduce the issue.
Example Code
from psycopg import Connection
from psycopg_pool import ConnectionPool
from psycopg.rows import dict_row
from langgraph.checkpoint.postgres import PostgresSaver
connection_kwargs = {"autocommit": True, "prepare_threshold": 0}
async with AsyncConnectionPool(conninfo=conninfo, max_size=20, kwargs=connection_kwargs) as pool:
graph = create_react_agent(
llm,
build_tools,
messages_modifier=_modify_messages,
checkpointer=AsyncPostgresSaver(pool), # type:ignore[arg-type]
)
async for event in graph.astream_events(
{"messages": [("human", search_params.question)]},
config={"configurable": {"thread_id": conversation_id, "recursion_limit": 20}},
stream_mode="values",
version="v2",
):
Error Message and Stack Trace (if applicable)
INSERT INTO checkpoints ( thread_id, checkpoint_ns, checkpoint_id, parent_checkpoint_id, checkpoint, metadata )
VALUES ( ? ) ON CONFLICT ( thread_id, checkpoint_ns, checkpoint_id ) DO
UPDATE SET checkpoint = EXCLUDED.checkpoint, metadata = EXCLUDED.metadata
psycopg.OperationalError: sending query and params failed: SSL error: bad length
File "/app/app/search/ai_models.py", line 315, in chat
async for event in graph.astream_events(
File "/app/venv/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 1386, in astream_events
async for event in event_stream:
File "/app/venv/lib/python3.12/site-packages/langchain_core/tracers/event_stream.py", line 1012, in _astream_events_implementation_v2
await task
File "/app/venv/lib/python3.12/site-packages/langchain_core/tracers/event_stream.py", line 967, in consume_astream
async for _ in event_streamer.tap_output_aiter(run_id, stream):
File "/app/venv/lib/python3.12/site-packages/langchain_core/tracers/event_stream.py", line 203, in tap_output_aiter
async for chunk in output:
File "/app/venv/lib/python3.12/site-packages/langgraph/pregel/__init__.py", line 1832, in astream
async with AsyncPregelLoop(
^^^^^^^^^^^^^^^^
File "/app/venv/lib/python3.12/site-packages/langgraph/pregel/loop.py", line 1035, in __aexit__
return await asyncio.shield(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 754, in __aexit__
raise exc_details[1]
File "/usr/local/lib/python3.12/contextlib.py", line 737, in __aexit__
cb_suppress = await cb(*exc_details)
^^^^^^^^^^^^^^^^^^^^^^
File "/app/venv/lib/python3.12/site-packages/langgraph/pregel/executor.py", line 200, in __aexit__
raise exc
File "/app/venv/lib/python3.12/site-packages/langgraph/pregel/loop.py", line 957, in _checkpointer_put_after_previous
await cast(BaseCheckpointSaver, self.checkpointer).aput(
File "/app/venv/lib/python3.12/site-packages/langgraph/checkpoint/postgres/aio.py", line 270, in aput
await cur.execute(
File "/app/venv/lib/python3.12/site-packages/ddtrace/contrib/dbapi_async.py", line 136, in execute
return await self._trace_method(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/venv/lib/python3.12/site-packages/ddtrace/contrib/dbapi_async.py", line 105, in _trace_method
return await method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/venv/lib/python3.12/site-packages/psycopg/cursor_async.py", line 97, in execute
raise ex.with_traceback(None)
psycopg.OperationalError: sending query and params failed: SSL error: bad length
SSL SYSCALL error: EOF detected
Description
Faced the below issue with langgraph-checkpoint-postgres:
psycopg.OperationalError: sending query and params failed: SSL error: bad length SSL SYSCALL error: EOF detected
NOTE: I have tried with multiple langgraph-checkpoint-postgres i.e 2.0.11, 2.0.9, 2.0.13, 2.0.15
System Info
langchain = "0.3.11" langchain-community = "0.3.11" langchain-experimental = "0.3.3" langchain-openai = "0.2.12" langchain-postgres = "0.0.12" langgraph = "0.2.58" psycopg = { extras = ["binary"], version = "3.2.3" } psycopg-pool = "3.2.3" sqlalchemy = { version = "2.0.36", extras = ["asyncio"] } sqlmodel = "0.0.22" asyncpg = "0.30.0" langgraph-checkpoint-postgres = "2.0.15"
I have the exact same issue.
Some times, and I can't figure out exactly when to properly reproduce it, the following is logged:
2025-03-06 14:00:53 | W | psycopg | error ignored terminating <psycopg.AsyncPipeline [BAD] at 0xffff482543e0>: the connection is lost
2025-03-06 14:00:53 | W | psycopg.pool | discarding closed connection: <psycopg.AsyncConnection [BAD] at 0xffff6408b080>
And then there is a failure:
SSL error: bad length
SSL SYSCALL error: EOF detected
Traceback (most recent call last):
File "/app/dojo/common/asyncio.py", line 39, in wrap_event_iterator_with_keep_alive
event = event_task.result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 1389, in astream_events
async for event in event_stream:
File "/usr/local/lib/python3.12/site-packages/langchain_core/tracers/event_stream.py", line 1012, in _astream_events_implementation_v2
await task
File "/usr/local/lib/python3.12/site-packages/langchain_core/tracers/event_stream.py", line 967, in consume_astream
async for _ in event_streamer.tap_output_aiter(run_id, stream):
File "/usr/local/lib/python3.12/site-packages/langchain_core/tracers/event_stream.py", line 203, in tap_output_aiter
async for chunk in output:
File "/usr/local/lib/python3.12/site-packages/langgraph/pregel/__init__.py", line 2227, in astream
async with AsyncPregelLoop(
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langgraph/pregel/loop.py", line 1109, in __aexit__
return await exit_task
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/contextlib.py", line 754, in __aexit__
raise exc_details[1]
File "/usr/local/lib/python3.12/contextlib.py", line 737, in __aexit__
cb_suppress = await cb(*exc_details)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/langgraph/pregel/executor.py", line 206, in __aexit__
raise exc
File "/usr/local/lib/python3.12/site-packages/langgraph/pregel/loop.py", line 1023, in _checkpointer_put_after_previous
await prev
File "/usr/local/lib/python3.12/site-packages/langgraph/pregel/loop.py", line 1025, in _checkpointer_put_after_previous
await cast(BaseCheckpointSaver, self.checkpointer).aput(
File "/usr/local/lib/python3.12/site-packages/langgraph/checkpoint/postgres/aio.py", line 261, in aput
await cur.executemany(
File "/usr/local/lib/python3.12/site-packages/psycopg/cursor_async.py", line 132, in executemany
raise ex.with_traceback(None)
psycopg.OperationalError: sending prepared query failed: SSL error: bad length
SSL SYSCALL error: EOF detected
any chance you might be running out of disk space? saw this https://stackoverflow.com/questions/63028407/psycopg2-databaseerror-ssl-error-bad-length, not 100% sure if this is relevant
At least on my side I don't think that's it. This only happens sometimes, not on all requests...
The log messages below always precede the psycopg.OperationalError. Those two messages appear and then, when after that a request comes and is processed it blows up like this in the middle of processing when it's storing the checkpoint.
2025-03-06 14:00:53 | W | psycopg | error ignored terminating <psycopg.AsyncPipeline [BAD] at 0xffff482543e0>: the connection is lost
2025-03-06 14:00:53 | W | psycopg.pool | discarding closed connection: <psycopg.AsyncConnection [BAD] at 0xffff6408b080>
If at least we could handle this issue, then at least we could mitigate it. Like this some requests fail because it blows up in the middle with no way to deal with it.
@antonioalegria any chance you have really large messages in react agent? potentially tool messages with large content?
Yes, there is that chance, definitely.
Yes, there is that chance, definitely.
I'm also exploring this idea.. great chat btw
Do you need any more info?
Are there updates on this issue? Thank you so much!
We're occasionally facing the same issue and haven't been able to reproduce it locally. Last time the error occurred, AWS RDS monitoring showed that a single database connection dropped at the same moment. We're not sure if that's the cause or just a side effect, but thought it might be useful context.
There were 15GB of free disk space at the time. It happened on the first HumanMessage with the content "hi", so it's unlikely to be related to message size.
langgraph: 0.2.67
langgraph-checkpoint: 2.0.24
langgraph-checkpoint-postgres: 2.0.13
That's quite interesting @nadavperetz .. thanks for clarifying the context size issue.
Same error here, any ideas on causes or solutions?
Did you try to update the langgraph-checkpoint-postgres with the latest version ?
Any update or has anybody else been able to resolve this? Seeing this issue with the latest version as well.
Any update on this? I am experiencing the same issue. I will note that recently in my application I have started using a data lookup in a prompt that gets a large amount of text (100,000+ characters). Don't know if its related but the issue is intermittent. I get the same 2 errors as others have mentioned:
OperationalError sending query and params failed: SSL error: bad length SSL SYSCALL error: EOF detected
Logged error: error ignored terminating <psycopg.Pipeline [BAD] at 0x1f430b99d00>: the connection is lost discarding closed connection: <psycopg.Connection [BAD] at 0x1f42668ae70>
Additionally, if I remove the large prompt the issue seems to go away. Is there a limit on message size in the checkpoint system? Environment: langgraph==0.3.30 psycopg[binary,pool]==3.2.9 langgraph-checkpoint-postgres == 2.0.21
Additionally, if I remove the large prompt the issue seems to go away. Is there a limit on message size in the checkpoint system? Environment: langgraph==0.3.30 psycopg[binary,pool]==3.2.9 langgraph-checkpoint-postgres == 2.0.21
In my case, there is a large amount of data stored in the state and not so much in the prompt/messages. The confusing part is the intermittent nature of this issue as it is not consistent. Still testing to confirm, but it seems this issue is less likely to occur when checkpoint.setup doesn't need to setup new tables.
@theory2 @TheTreeHacker is this occuring for you both when using a cloud-hosted postgres database? What about a local postgres database with docker? I found this was occuring using cloud databases (Azure and AWS), but not for local dbs.
@jdg9vr Yes I am using a postgres db v15.12 in Azure. I haven't used a local version
@theory2 @TheTreeHacker is this occuring for you both when using a cloud-hosted postgres database? What about a local postgres database with docker? I found this was occuring using cloud databases (Azure and AWS), but not for local dbs.
Yes, I'm seeing this with Postgres on RDS. Haven't tested local.
same issue
Cloud database for me as well.
This SSL error could definitely be masking other underlying issues therefore it wouldn't necessarily be raised locally. The SSL connection failure appears to be a symptom rather than the root cause.
@vbarda Is anyone looking into this at Langchain?
I am also experiencing this issue
At the minimum we need a workaround or a way to be able to recover from these errors without failing to process the whole thing
I'm experience the same issue on the cloud
I'm getting the same error. Any help would be greatly appreciated.
Would love to get some update on this from the team - if it's being addressed, if there is a workaround, etc. Thank you!
@antonioalegria I second this^
Seconding this too!