electric
electric copied to clipboard
Consolidate advisory lock into replication connection
Each Electric instance currently uses 2 WAL sender connections to Postgres:
- One for replication stream
- One for advisory lock coordination
This doubles the max_wal_senders pressure, particularly problematic for users with multiple databases on the same Postgres instance (since max_wal_senders is shared across all databases). During the Oct 13 incident, this contributed to "max wal senders" errors when multiple sources tried to reconnect.
Proposed Solution: Merge the advisory lock acquisition into the replication connection itself, reducing WAL sender usage from 2 to 1 per instance.
Requirements
- Acquire advisory lock as first step when establishing replication connection
- If replication connection dies, lock is automatically released
- When lock is lost (but connection alive), transition to read-only/degraded mode instead of full shutdown
We discussed potential improvements to the lockbreaker:
- If lock is moved into replication connection, lockbreaker can handle both
- Advisory lock stays alive if backend doesn't die
- Lockbreaker can identify and kill orphaned backends by advisory lock name
- Prevents multiple connections on same slot (lock prevents duplicates)
- Need to verify replication slot "active" state behavior with stuck backends
Closing this as it has been done