John Spray
John Spray
You'll need to dig into the state that's carried on layers for eviction (there's a timestamp which is only sometimes modified). It'll be important not to end up in a...
> Indeed. But I would imagine that when pageserver is compacting and there is not enough disk, backpressure should kick in? Not sure which backpressure mechanism you mean: generally the...
That comment is describing a heuristic for alerting on thrashing, rather than something that would actually prevent it.
@alexanderlaw looks like this one is still failing, can you check if there is a new failure mode, or is it the same thing?
As a precursor to fully removing the weird binary metadata blob, we could make a change to remove the disk_consistent_lsn rust struct member, and just skip that field when decoding...
The error means that one pageserver is trying to read a layer file that some other newer attached location already deleted. If the test is manually configuring endpoints, then it...
The intent of the test is to exercise lots of permutations of state transitions for a tenant on the pageserver (i.e. all the code paths where we enter `TenantManager::upsert_location` and...
That reproducer isn't failing on my machine, but I can see how the current conditions are insufficient: https://github.com/neondatabase/neon/pull/12059 @alexanderlaw can you let me know if you can still reproduce failures...
>So I wonder, if the readability rules apply to compute only, what should prevent walreceiver from accessing missing layer files? The way the test is currently written, nothing prevents that...
#12059 should eliminate failures while reading with postgres, but can still have ERROR type failures from pageserver logs. @alexanderlaw would you mind adding log allow-list entries to this test, for...