Vlad Lazar
Vlad Lazar
"The test_rack_awareness failed in debug mode only, i checked and is it related with the fact that nodes are slow i.e. recovery is making very slow progress. I am going...
https://buildkite.com/redpanda/redpanda/builds/26162#018737a2-3d1e-44b9-a979-7f117b1b243f https://buildkite.com/redpanda/redpanda/builds/26162#018737b3-d248-4be3-aac0-859f5bb01adf
Another couple failures with the same failure mode: FAIL test: ConsumerGroupTest.test_dead_group_recovery.static_members=False (2/36 runs) failure at 2022-10-28T13:14:32.221Z: in job https://buildkite.com/redpanda/redpanda/builds/17516#01841e3c-a9c7-4689-911c-4d39a0df2944 failure at 2022-10-27T19:32:22.027Z: in job https://buildkite.com/redpanda/redpanda/builds/17464#01841a73-7c5a-4ee0-92f2-a1bba1376b98
FAIL test: ConsumerGroupTest.test_dead_group_recovery.static_members=True (1/29 runs) failure at 2022-11-03T07:37:41.574Z: in job https://buildkite.com/redpanda/redpanda/builds/17827#01843bef-b5c0-4258-b4bc-98e6476f9b6a
I've also seen this fail in my Azure CDT runs fairly reliably (same failure mode). I'm using [Standard_L8s_v3](https://learn.microsoft.com/en-us/azure/virtual-machines/lsv3-series) nodes for Redpanda and [Standard_D4ds_v4](https://learn.microsoft.com/en-us/azure/virtual-machines/ddv4-ddsv4-series) for the client.
Status: * Implementation [PR](https://github.com/neondatabase/neon/pull/6576) is open - needs debugging of regress test failures, but otherwise reviewable * This week: * fix bugs surfaced by regress tests * update pagebench to...
#### Status Last week: * stabilised impl https://github.com/neondatabase/neon/pull/6576 * went through one round of review This week: * pagebench * Disk IO improvements
Last week: * merged https://github.com/neondatabase/neon/pull/6576 * benchmarking * opened https://github.com/neondatabase/neon/pull/6780 * more validation testing using `get_page_latest_lsn` bench This week: * merge disk IO stuff * start deployment
Last week: * released to staging which caused panics * identified the issues in delta layer index traversal This week: * fix the issues mentioned above and write tests for...
Last week: * Deployed to IL and monitored: looking good + basebackup latency stayed stable which was expected (low slru count) + vectored latency was lower after deploy (promising) This...