electric icon indicating copy to clipboard operation
electric copied to clipboard

Consolidate advisory lock into replication connection

Open balegas opened this issue 2 months ago • 1 comments

Each Electric instance currently uses 2 WAL sender connections to Postgres:

  1. One for replication stream
  2. One for advisory lock coordination

This doubles the max_wal_senders pressure, particularly problematic for users with multiple databases on the same Postgres instance (since max_wal_senders is shared across all databases). During the Oct 13 incident, this contributed to "max wal senders" errors when multiple sources tried to reconnect.

Proposed Solution: Merge the advisory lock acquisition into the replication connection itself, reducing WAL sender usage from 2 to 1 per instance.

Requirements

  • Acquire advisory lock as first step when establishing replication connection
  • If replication connection dies, lock is automatically released
  • When lock is lost (but connection alive), transition to read-only/degraded mode instead of full shutdown

balegas avatar Oct 14 '25 12:10 balegas

We discussed potential improvements to the lockbreaker:

  • If lock is moved into replication connection, lockbreaker can handle both
  • Advisory lock stays alive if backend doesn't die
  • Lockbreaker can identify and kill orphaned backends by advisory lock name
  • Prevents multiple connections on same slot (lock prevents duplicates)
  • Need to verify replication slot "active" state behavior with stuck backends

balegas avatar Oct 14 '25 12:10 balegas

Closing this as it has been done

msfstef avatar Dec 03 '25 09:12 msfstef