John Spray
John Spray
I think I ended up the assignee here because of my previous change to the describe groups function, but am not actively working on it.
Notwithstanding the overly aggressive retries, for systems not connected to the internet one should set `enable_metrics_reporter` to false.
This isn't a bug in the verifiers if it really does just take that long -- the test should use an timeout when waiting that matches the amount of data...
@andrwng kgo-verifier takes two CLI arguments to control logging: `-debug` makes it output logs from the verifier code (this'll give you messages that snow when it's e.g. entering & leaving...
There's a similar sort of symptom in https://github.com/redpanda-data/redpanda/issues/6411 where it looks like a consumer is hanging. In https://github.com/redpanda-data/redpanda/issues/6413, a consumer is getting persistent NOT_LEADER_FOR_PARTITION errors while querying offsets -- it's...
I had an interesting observation that when I made a change that accidentally caused redpanda to fail to start in this test[1], it manifested as a hang (buildkite run timeout)...
https://buildkite.com/redpanda/redpanda/builds/16090#0183a184-2c91-49f1-8d86-c7b9d1db5df7 Since the verbose logging went in, we can see where the consumer is hung: ``` time="2022-10-04T07:14:14Z" level=debug msg="Read OK (000000.000000000000033223) on p=2 at o=33223" time="2022-10-04T07:14:14Z" level=debug msg="Calling PollFetches (last_read=[33467...
I think I see the fetches for the partition right before a node restart: ``` # The one that gives us that first 8070 messages TRACE 2022-10-04 07:14:13,347 [shard 1]...
I am trying Noah's new ci-repeat hotness to do a repeated run of this test with franz-go debug logs here https://github.com/redpanda-data/redpanda/pull/6620
I note that the client hang starts while Redpanda is on version v22.1.7, which does not have the offset_for_leader epoch fix https://github.com/redpanda-data/redpanda/pull/6400 -- this is backported to v22.1.x but not...