John Spray

Results 330 comments of John Spray

Probably reocurrence FAIL test: FranzGoVerifiableWithSiTest.test_si_without_timeboxed.segment_size=104857600 (1/1 runs) failure at 2022-08-04T08:40:43.589Z: NodeCrash([(, "ERROR 2022-08-04 06:34:41,875 [shard 0] assert - Assert failure: (../../../src/v/storage/segment_reader.cc:215) '_parent == nullptr' Must close before destroying\n")]) in job...

This ran 200x locally without failing. In CI we've seen the original failure very rarely, so it may be a while before any of these trip.

Those failures are highlighting a nasty behavior where VerifiableProducer can be instantiated on two nodes and their outputs will fight for updates to last_acked_offsets. That's a bug, fixed. Unfortunately the...

Those last failures were because the logic for whether to use strict validation or not (single or multiple producers) was flipped. Fixed.

@NyaliaLui please could you re-review + clear your :red_circle: if all good

Observations: - Crash is happening several minutes after startup of 22.1.7 on docker-rp-2, so unlikely to be an issue replaying/decoding the content that an earlier version wrote (this is all...

I don't see us fixing any raft crashes the src/v/raft commit history 22.1->22.2, so it may well be that this issue is still present and we only saw it in...

This may or may not be related, but there's a big rework of the kgo-verifier wrapper services https://github.com/redpanda-data/redpanda/pull/6059 here that should make them more robust (stop running them captively on...

https://buildkite.com/redpanda/redpanda/builds/12246#0181d92a-37c4-4db7-9bf2-b481930a9515

The rpk group_describe function is meant to be a retry loop, but wasn't correctly handling the case of the coordinator being unavailable -- I hit the same thing with scale...