John Spray comments

Results 365 comments of


                                            John Spray

trafficstars

pageserver: thrashing on compaction and eviction loop

You'll need to dig into the state that's carried on layers for eviction (there's a timestamp which is only sometimes modified). It'll be important not to end up in a...

pageserver: thrashing on compaction and eviction loop

> Indeed. But I would imagine that when pageserver is compacting and there is not enough disk, backpressure should kick in? Not sure which backpressure mechanism you mean: generally the...

pageserver: thrashing on compaction and eviction loop

That comment is describing a heuristic for alerting on thrashing, rather than something that would actually prevent it.

test_lr_with_slow_safekeeper is flaky because logical_replication_sync not waiting for tablesync

@alexanderlaw looks like this one is still failing, can you check if there is a new failure mode, or is it the same thing?

Duplicated `disk_consistent_lsn` in `IndexPart`

As a precursor to fully removing the weird binary metadata blob, we could make a change to remove the disk_consistent_lsn rust struct member, and just skip that field when decoding...

test_location_conf_churn fails with "could not read block" error on attempt to download deleted layer

The error means that one pageserver is trying to read a layer file that some other newer attached location already deleted. If the test is manually configuring endpoints, then it...

test_location_conf_churn fails with "could not read block" error on attempt to download deleted layer

The intent of the test is to exercise lots of permutations of state transitions for a tenant on the pageserver (i.e. all the code paths where we enter `TenantManager::upsert_location` and...

test_location_conf_churn fails with "could not read block" error on attempt to download deleted layer

That reproducer isn't failing on my machine, but I can see how the current conditions are insufficient: https://github.com/neondatabase/neon/pull/12059 @alexanderlaw can you let me know if you can still reproduce failures...

test_location_conf_churn fails with "could not read block" error on attempt to download deleted layer

>So I wonder, if the readability rules apply to compute only, what should prevent walreceiver from accessing missing layer files? The way the test is currently written, nothing prevents that...

test_location_conf_churn fails with "could not read block" error on attempt to download deleted layer

#12059 should eliminate failures while reading with postgres, but can still have ERROR type failures from pageserver logs. @alexanderlaw would you mind adding log allow-list entries to this test, for...