Noah Watkins
Noah Watkins
Note that the `Protocol violation` didn't occur in any of the extra reported cases in this issue. The scenarios under which the protocol violation might occur are becoming more exotic...
looked into this again, and it continues to be a mystery. the leading contender root cause is that there is a bug in the reconnect_transport resetting logic and a reply...
an interesting bit is here ``` INFO 2022-08-04 20:23:48,024 [shard 1] rpc - parse_utils.h:59 - rpc header missmatching checksums. expected:20480, got:1044318496 - {version:0, header_checksum:20480, compression:0, payload_size:786432, meta:131072, correlation_id:2097152, payload_checksum:0} ```...
@ztlpn 1. was this the same test that this ticket is tracking? 2. did you see the same Protocol violation? Yesterday we changed the handling for the protocol violation so...
started a new round of debug runs: https://github.com/redpanda-data/redpanda/pull/5931 still unable to reproduce locally
Note here for myself ``` rptest.services.utils.BadLogLines: ``` Extra context in logs strongly suggests that this _isnt_ a concurrency bug as it looks like no transport resets are occurring between request...
this is a tough one
@Lazin any more context for this? sha1, cdt, custom cluster, test/no-test?
> Seen again - seems to be the same as the one I reported yesterday: https://ci-artifacts.dev.vectorized.cloud/redpanda/018281c1-1808-413b-b7e8-9e72e8d0082e/vbuild/ducktape/results/2022-08-09--001/PartitionMovementUpgradeTest/test_basic_upgrade/137/ > > ``` > [INFO - 2022-08-09 09:41:23,203 - runner_client - log - lineno:278]:...
@BenPope was that from a 22.1.x run? Perhaps it's just that the fix needs to be backported?