Electric should be resilient to disconnecting to Postgres
Whether it's Postgres restarting or network connectivity issues.
- [ ] Write a test to verify Electric's graceful handling of Postgres connection closures.
I suggest moving it outside Second alpha, given it will only be fully addressed by electric-sql/electric-next#34 and electric-sql/electric-next#106
Hmm yeah — how much work is left to fix those @alco? It does seem like basic resiliency to starting / stopping backend services would be good to get in early as the goal is to make Electric stable for developing locally and deploying relatively simple applications. The theme of "Second Alpha" is basically "make all the normal things work".
https://github.com/electric-sql/electric-next/pull/34 has been ready for review since last week. I keep rebasing it from time to time to resolve conflicts. It solves the core need of being able to resume replication from Postgres after the replication connection is closed and reopened.
Now that I'm thinking of this, we're missing a test that would verify idempotent processing of transactions if it so happens that the replication connection closes right after we've persisted a transaction to the shape log but just before we have acknowledged its LSN to Postgres. Adding this as a TODO to the 2nd PR - https://github.com/electric-sql/electric-next/pull/106.
The latter PR is missing one key thing: reconnection logic. I had to disable auto-reconnection in Postgrex.ReplicationConnection that we'd been using in order to be able to handle replication slot errors. We now need to put our own reconnection logic in place.
Great! Let's get these reviewed and in early next week :shipit:
https://github.com/electric-sql/electric-next/pull/205 makes Electric handle Postgres disconnection gracefully and reconnect when it's back up.
There's no automated testing to verify this behaviour yet. I'll leave this issue open until we have a way to test the resilience.
@alco shall we create the test and close this?