John Spray
John Spray
This week: - Deploy patch stack 2 - PRs for remaining write path changes (stretch: including on-demand downloads)
A recent failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6407/8295888842/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/fbe892c34b63cdb
Next: - Implement the imitate-only task so that we can disable time based eviction. - Engage CP team to agree new API for exposing a size heuristic to unblock moving...
Next step: - Define the interface for CP for utilization - Avoid taking tenant locks when collecting layers to evict.
Extra notes: - The try_lock change was reverted for lack of evidence that it was the underlying cause - So ~10 minute hang is still probably in there: expect to...
> split the https://github.com/neondatabase/neon/pull/7030, get reviews through-out the week Note to self: this about hangs in disk usage based eviction while collecting layers.
One of the reasons we didn't see this more widely is that nonzero shards still often do SLRU writes during ingest, even if they're not ingesting any of the data...
Changes in addition to review comments: - Making a stable test for this has highlighted the need to skip SLRU and checkpoint content on >0 shards, otherwise they're far too...
I backed out the changes to ingest logic, and loosened the test slightly to tolerate that. The ingest changes were kind of ugly/fragile, and in any case it was incorrect...
Status: - Database persistence landed Friday - APIs for control plane integration are under review today - Work has started on deploying what we currently have into staging, to unblock...