neon
neon copied to clipboard
Epic: move outbound logical replication out of Beta
DoD
logical replication is not in beta on neon anymore and wal_level = logical can be enabled by default on all project on neon platform
### Tasks & bugs to fix
- [ ] https://github.com/neondatabase/neon/issues/6370
- [ ] https://github.com/neondatabase/neon/issues/6371
- [ ] https://github.com/neondatabase/neon/pull/6221
- [ ] https://github.com/neondatabase/neon/issues/6182
- [ ] https://github.com/neondatabase/neon/issues/6229
- [ ] https://github.com/neondatabase/cloud/pull/9015
- [ ] https://github.com/neondatabase/neon/issues/6257
- [ ] https://github.com/neondatabase/neon/issues/7593
- [x] Check that AUX v2 is default for all new tenants on pageserver
- [ ] https://github.com/neondatabase/neon/issues/8349
- [ ] https://github.com/neondatabase/cloud/issues/15226
- [ ] https://github.com/neondatabase/neon/issues/6626
- [x] Figure out observability of lagging publisher (slot retaining a lot of WAL)
- [ ] https://github.com/neondatabase/neon/issues/5885
- [ ] https://github.com/neondatabase/neon/issues/8931
- [ ] https://github.com/neondatabase/cloud/issues/17261
- [ ] https://github.com/neondatabase/neon/issues/8619
- [ ] Logical slots are copied to the replica and prevent WAL truncation, see https://github.com/neondatabase/neon/pull/9425#discussion_r1804820659
Follow-ups (out of scope):
- https://github.com/neondatabase/neon/issues/6258
Other related tasks and Epics
- https://github.com/neondatabase/cloud/issues/8892
https://neondb.slack.com/archives/C04DGM6SMTM/p1703091242312799
@arssher , will your fixes help with first two items?
slot may disappear on restart (hard to reproduce., only occured once)
I still don't know what was that. Stas tested manually and observed this once. There is known path by which slot might be lost, but this is highly unlikely (endpoint killed before logical message is committed to safekeepers), Stas case wasn't like that. Need more testing and reproduction.
in one case replication wasn't able to read WAL (hard to reproduce)
The more proper description would be 'if slot is lagging, on compute start replication might fail until the whole tail is downloaded'. We merged cap on max allowed lagging, but to really fix it we need to bring on demand WAL download from safekeepers to logical walsenders. We recently merged core patch: https://github.com/neondatabase/neon/pull/5948 but using it in logical walsenders is separate step. Shouldn't be hard, but I haven't started on that.
@arssher will you work on on-demand WAL download in walsenders? Is it a part of this epic [will it block announcing GA for logical replication]?
@kelvich
slot may disappear on restart (hard to reproduce., only occured once) I think nobody has been able to reproduce it so far. Reasonable question: is it a part of the epic's scope?
Renamed to "outbound" logical replication. When this Epic was started, it was "only" logical replication, but now it's two different types of replication.
Discussion with Stas: improve pageserver performance first
This week:
- [x] Sasha: Finish the aux v2 rollout (last batch tomorrow morning)
This week:
- [x] Waiting for the new compute image rollout