John Spray

Results 452 comments of John Spray

This might make sense to clean up as we add Azure support to the scrubber in https://github.com/neondatabase/neon/issues/7547 -- that might involve just using remote_storage everywhere.

Erik suggests that his recent changes might have pushed this past a timeout threshold

Current situation is okay - this ticket tracks something we should make more efficient in future.

@ephemeralsad I was confused for a few minutes... then realized, to reproduce this, one must comment out the body of `kick_secondary_download` function in storage controller. That function is only compiled...

(notes chatting with Arthur) Impact: interferes with writing clean tests. Currently if a safekeeper has stale remote_consistent_lsn for long enough, it will remain active & the pageserver will eventually connect...

We have larger plans to retire the broker, so this ticket is probably stale

This looks like a pageserver bug in timeline shutdown, or something in that shutdown is pathologically slow The controller is calling into pageserver, and the logs trail off at "Waiting...

~8 seconds after we started shutting this tenant down, a page_service handler logs that its shutting down: ``` data_attachments_8952e0edae10f2b3:2025-05-29T10:53:53.245869Z INFO page_service_conn_main{peer_addr=127.0.0.1:55052 application_name=81361 compute_mode=primary}:process_query{tenant_id=efcc9b039193ec1c68c1ec8218bc82f9 timeline_id=917bfbf9098d51dd0249356751fceade}:handle_pagerequests:request:handle_get_page_request{request_id=349442834189581 rel=1663/5/16396 blkno=53442 req_lsn=FFFFFFFF/FFFFFFFF not_modified_since_lsn=0/168EC020 shard_id=0000}: dropping...

It's super weird how consistently this is happening. It's failing every time in CI, not at all locally, and in both failures I looked at in detail it is specifically...

@macdoos feel free to work on this: you might want to start by picking a particular service (e.g. safekeeper) to switch on auth by default. Then open separate PRs for...