[Bug]:PostgreSQL restart breaks LightRAG permanently — no auto-reconnect & /health still returns OK

Open akash171198 opened this issue 1 month ago • 1 comments

Do you need to file an issue?

[x] I have searched the existing issues and this bug is not already filed.
[x] I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

When using LightRAG with PostgreSQL storage (PGKVStorage, PGGraphStorage, PGVectorStorage, PGDocStatusStorage), the application does not recover after PostgreSQL restarts.

The LightRAG container continues running
/health endpoint still returns 200
But every RAG operation fails internally because the PostgreSQL connection pool is stale

The only way to recover is to restart/redeploy the LightRAG container manually.

This makes LightRAG not resilient to database restarts or failovers.

Steps to reproduce

Deploy LightRAG using the official image:

ghcr.io/hkuds/lightrag:latest

Configure PostgreSQL using .env:

LIGHTRAG_KV_STORAGE=PGKVStorage LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage LIGHTRAG_GRAPH_STORAGE=PGGraphStorage LIGHTRAG_VECTOR_STORAGE=PGVectorStorage

Start LightRAG (Docker, ECS, or Compose)

Restart PostgreSQL:

docker restart postgres

Try querying LightRAG again via API or UI

Expected Behavior

When PostgreSQL restarts, LightRAG should retry DB connections
It should detect stale connections and rebuild the connection pool
/health endpoint should return UNHEALTHY when DB is unreachable
LightRAG should survive DB failover without restarting the container manually

LightRAG Config Used

Paste your config here

LightRAG Test Configuration

HOST=0.0.0.0 PORT=9621

Workspace

WORKSPACE=testworkspace

LLM (not relevant to DB issue, but included)

LLM_BINDING=ollama LLM_MODEL=deepseek-v3.1:671b-cloud LLM_BINDING_HOST=https://llm.test.com LLM_BINDING_API_KEY=dummy-key OLLAMA_LLM_NUM_CTX=32768

Embedding

EMBEDDING_BINDING=ollama EMBEDDING_MODEL=nomic-embed-text:latest EMBEDDING_BINDING_HOST=https://llm.test.com EMBEDDING_BINDING_API_KEY=dummy-key EMBEDDING_TIMEOUT=600

Storage (PostgreSQL)

LIGHTRAG_KV_STORAGE=PGKVStorage LIGHTRAG_DOC_STATUS_STORAGE=PGDocStatusStorage LIGHTRAG_GRAPH_STORAGE=PGGraphStorage LIGHTRAG_VECTOR_STORAGE=PGVectorStorage

PostgreSQL Settings

POSTGRES_HOST=postgres POSTGRES_PORT=5432 POSTGRES_USER=testuser POSTGRES_PASSWORD=testpassword POSTGRES_DATABASE=testdb POSTGRES_MAX_CONNECTIONS=10 POSTGRES_WORKSPACE=testworkspace

PostgreSQL Vector Index

POSTGRES_VECTOR_INDEX_TYPE=HNSW POSTGRES_HNSW_M=16 POSTGRES_HNSW_EF=200

Logging

LOG_LEVEL=INFO

Performance / Timeouts

HTTPX_TIMEOUT=300 LLM_TIMEOUT=300 MAX_ASYNC=4 MAX_PARALLEL_INSERT=2

Other optional config

ENABLE_LLM_CACHE=true ENABLE_LLM_CACHE_FOR_EXTRACT=true SUMMARY_LANGUAGE=English

Logs and screenshots

Additional Information

LightRAG Version:
Operating System:
Python Version:
Related Issues:

Nov 13 '25 17:11 akash171198