Thang Pham

Results 235 comments of Thang Pham

To summarize what we have done with the current backpressure approach, - I added backpressure tests in https://github.com/neondatabase/neon/pull/1919 that confirms the big read latency in https://github.com/neondatabase/neon/issues/1763 - @knizhnik added a...

> 1. Even without last written LSN cache (pageserver#177) I have not seen large latencies with #1919 and #1763 tests with `max-replication_write_lag=15MB` and `wal_log_hints=off`, Second one is important because when...

Look like this branch is based on the code in #1891 as `record_read_latency` isn't defined anywhere in `main`. The new test shares some similarities with the tests added in #1891....

Thanks for the detailed description. It really helps when investigating the issue. Just taking a look into this. It seems that this issue happens when we "Restarting all safekeepers and...

My wild guess is that some timeout happens that forces etcd to start a new connection, which is the reason why the insert's request time is about 15secs every time.

> to be the very related but ignored is when all safekeepers report some Lsn of it that's below what pageserver has: this way they are never considered by pageserver...

One way to fix this is to modify `WalreceiverState::applicable_connection_candidates` to accept new candidate with `etcd_info.timeline.commit_lsn >= Some(self.local_timeline.get_last_record_lsn())` if page server has no connection. Otherwise, use `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())`. @SomeoneToIgnore WDYT?

> Why does this work normally after 15s? Probably, the compute node has some timeout or some background job that will send WAL regardless of the lag value. I dig...

> 1. Is it correct that there is a part of the compute that sends WAL to Safekeepers ignoring backpressure? Yes. The WAL sender isn't affected by backpressure. It sends...

> > This startup issue happens consistently > > Hmm, if I try `RUST_LOG=debug poetry run pytest -k 'test_wal_generate[simple]'`, I do not see such behavior, do you? The difference is...