Heikki Linnakangas
Heikki Linnakangas
I filled in the blanks of records that we know are no-ops. I didn't try to compile it, but I believe the finishing this now won't require any particular Postgres...
> Please push more fixups into this branch to make it compile. No force pushes, no rebases, no merges from main please. Ok, done
> But as far as I know there are no some artificial delays (i.e. sleeps...). There is, see `autovacuum_vacuum_cost_delay` and https://www.postgresql.org/docs/current/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-VACUUM-COST. With a small database, it will still finish reasonably...
It's still a potential issue. Not sure if XLP_FIRST_IS_OVERWRITE_CONTRECORD is ever created in practice in Neon, but I can't rule it out either. If we implement importing WAL from a...
I guess we need to create an empty delta layer to cover those records..
One idea on how to implement this: When the compute shuts down, we keep the subcriber client-> proxy connection open. When the compute wakes up again (because of some other...
Also related: https://github.com/neondatabase/neon/issues/1773
I spent some time looking at the logs from that time period, from different angles. I still don't understand what the root cause is. But a couple of suspicious /...
I found one instance of this stack trace in staging: ``` 2023-11-06T06:38:44.817901Z INFO page_service_conn_main{peer_addr=10.10.15.121:52550}:process_query{tenant_id=813b336c71f6c9910c1009ef68cf7643 timeline_id=9a4160c0ffb4e6db06d528364efa7667}:handle_basebackup_request{lsn=Some(0/2016ED8) prev_lsn=None full_backup=false}: waiting for 0/2016ED8 2023-11-06T06:38:44.817968Z INFO page_service_conn_main{peer_addr=10.10.15.121:52550}:process_query{tenant_id=813b336c71f6c9910c1009ef68cf7643 timeline_id=9a4160c0ffb4e6db06d528364efa7667}:handle_basebackup_request{lsn=Some(0/2016ED8) prev_lsn=None full_backup=false}: taking basebackup lsn=0/2016ED8, prev_lsn=0/2016EA0...
It's possible that this was fixed by https://github.com/neondatabase/neon/pull/6046. Next step is to wait until that is deployed, and monitor the logs to see if those errors still appear.