redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

cl/replication/tests: deflake test_with_restart

Open bharathv opened this issue 1 month ago • 15 comments

See individual commits.

Fixes: https://redpandadata.atlassian.net/browse/CORE-14672

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [x] v25.3.x
  • [ ] v25.2.x
  • [ ] v25.1.x
  • [ ] v24.3.x

Release Notes

  • none

bharathv avatar Nov 19 '25 22:11 bharathv

/ci-repeat 1 dt-repeat=50 tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_with_restart

bharathv avatar Nov 19 '25 22:11 bharathv

CI test results

test results on build#76687
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_with_restart null integration https://buildkite.com/redpanda/redpanda/builds/76687#019a9e5b-88d8-46ac-8426-03f412a79e4c FLAKY 69/70 upstream reliability is '99.71910112359551'. current run reliability is '95.23809523809523'. drift is 4.48101 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
test results on build#76770
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_with_restart null integration https://buildkite.com/redpanda/redpanda/builds/76770#019aa4ce-a677-49dd-b9b5-d50a81c84696 FLAKY 118/120 upstream reliability is '99.75'. current run reliability is '90.9090909090909'. drift is 8.84091 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
test results on build#77384
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkConsumeGroupsMirroringTest test_consumer_groups_mirroring {"source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24c9-4d80-b6cc-1aa9770be6f8 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkConsumeGroupsMirroringTest&test_method=test_consumer_groups_mirroring
ShadowLinkConsumeGroupsMirroringTest test_consumer_groups_mirroring {"source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb10-405a-b47e-c36f79b30dc9 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkConsumeGroupsMirroringTest&test_method=test_consumer_groups_mirroring
ShadowLinkConsumeGroupsMirroringTest test_consumer_groups_mirroring {"source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24ca-4a8f-b7c6-7505f47ec84f FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkConsumeGroupsMirroringTest&test_method=test_consumer_groups_mirroring
ShadowLinkConsumeGroupsMirroringTest test_consumer_groups_mirroring {"source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb12-4e6b-a1f4-a140a951d038 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkConsumeGroupsMirroringTest&test_method=test_consumer_groups_mirroring
ShadowLinkingReplicationTests test_topic_delete {"source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb12-4e6b-a1f4-a140a951d038 FLAKY 16/21 upstream reliability is '93.75'. current run reliability is '76.19047619047619'. drift is 17.55952 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
ConsumerOffsetsRecoveryTest test_consumer_offsets_partition_recovery {"force_offset_upload_failures": false} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24c7-4fba-bd9e-a073508755ab FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsRecoveryTest&test_method=test_consumer_offsets_partition_recovery
ConsumerOffsetsRecoveryTest test_consumer_offsets_partition_recovery {"force_offset_upload_failures": false} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb0e-4e6c-aae5-162306efff37 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsRecoveryTest&test_method=test_consumer_offsets_partition_recovery
ConsumerOffsetsRecoveryTest test_consumer_offsets_partition_recovery {"force_offset_upload_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24c9-4d80-b6cc-1aa9770be6f8 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsRecoveryTest&test_method=test_consumer_offsets_partition_recovery
ConsumerOffsetsRecoveryTest test_consumer_offsets_partition_recovery {"force_offset_upload_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb10-405a-b47e-c36f79b30dc9 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsRecoveryTest&test_method=test_consumer_offsets_partition_recovery
ConsumerOffsetsRecoveryToolTest test_consumer_offsets_partition_count_change null integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24ca-4a8f-b7c6-7505f47ec84f FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsRecoveryToolTest&test_method=test_consumer_offsets_partition_count_change
ConsumerOffsetsRecoveryToolTest test_consumer_offsets_partition_count_change null integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb12-4e6b-a1f4-a140a951d038 FAIL 0/21 The test has failed across all retries https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsRecoveryToolTest&test_method=test_consumer_offsets_partition_count_change
NodesDecommissioningTest test_decommissioning_rebalancing_node {"shutdown_decommissioned": true} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24c7-4fba-bd9e-a073508755ab FLAKY 16/21 upstream reliability is '92.3507462686567'. current run reliability is '76.19047619047619'. drift is 16.16027 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommissioning_rebalancing_node
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": true} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed19-fb0a-41d0-adb2-9cbcc3d8a764 FLAKY 9/21 upstream reliability is '86.11111111111111'. current run reliability is '42.857142857142854'. drift is 43.25397 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/77384#019aed14-24c6-4907-80ce-4da059ade838 FLAKY 20/21 upstream reliability is '88.67521367521367'. current run reliability is '95.23809523809523'. drift is -6.56288 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#77449
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ScalingUpTest test_fast_node_addition null integration https://buildkite.com/redpanda/redpanda/builds/77449#019af002-5f1e-4eba-887e-ace2b85628c4 FLAKY 29/31 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0262, p0=0.5495, reject_threshold=0.0100. adj_baseline=0.0766, p1=0.3191, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition
test results on build#77471
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_topic_delete {"source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af11d-4e4b-4fee-97cd-2961bd4772ad FLAKY 8/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
ShadowLinkingReplicationTests test_topic_delete {"source_cluster_spec": {"cluster_type": "redpanda"}} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-483b-4788-a5bd-41cac02ed7ac FLAKY 14/21 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0728, p0=0.0024, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete
ConsumerOffsetsConsistencyTest test_flipping_leadership null integration https://buildkite.com/redpanda/redpanda/builds/77471#019af11d-4e4b-4fee-97cd-2961bd4772ad FLAKY 4/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsConsistencyTest&test_method=test_flipping_leadership
ConsumerOffsetsConsistencyTest test_flipping_leadership null integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-483a-48af-bb77-78d7f8310698 FLAKY 3/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerOffsetsConsistencyTest&test_method=test_flipping_leadership
DatalakeDLQTest test_dlq_table_for_mixed_records {"catalog_type": "rest_jdbc", "cloud_storage_type": 1, "query_engine": "spark"} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af11d-4e47-44ed-996e-bee3197e8e20 FLAKY 49/51 Test PASSES after retries.Inconclusive result after max retries(baseline=0.0005, p0=0.0251, reject_threshold=0.0100. adj_baseline=0.0015, p1=0.9973, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=DatalakeDLQTest&test_method=test_dlq_table_for_mixed_records
NodesDecommissioningTest test_decommissioning_rebalancing_node {"shutdown_decommissioned": true} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-483a-48af-bb77-78d7f8310698 FLAKY 16/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0896, p0=0.0982, reject_threshold=0.0100. adj_baseline=0.2455, p1=0.4330, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommissioning_rebalancing_node
NodesDecommissioningTest test_flipping_decommission_recommission {"cloud_topic": true, "node_is_alive": false} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af11d-4e4b-4fee-97cd-2961bd4772ad FLAKY 50/51 Test PASSES after retries.Inconclusive result after max retries(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.0000, p1=1.0000, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_flipping_decommission_recommission
PartitionMoveInterruption test_cancellations_interrupted_with_restarts {"replication_factor": 3} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-4839-4ea2-9cfc-ce0cb191a923 FLAKY 7/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.0000, p0=0.0000, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionMoveInterruption&test_method=test_cancellations_interrupted_with_restarts
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af11d-4e46-4544-9ed1-36f5e23530a1 FLAKY 38/41 Test is INCONCLUSIVE after retries.Inconclusive result before max retries(baseline=0.0115, p0=0.0773, reject_threshold=0.0100. adj_baseline=0.0341, p1=0.8449, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
ScalingUpTest test_fast_node_addition null integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-4839-4ea2-9cfc-ce0cb191a923 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0261, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.0763, p1=0.4520, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition
ScalingUpTest test_moves_with_local_retention {"use_topic_property": true} integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-483b-4788-a5bd-41cac02ed7ac FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0253, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.0740, p1=0.4637, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_moves_with_local_retention
WriteCachingFailureInjectionTest test_unavoidable_data_loss null integration https://buildkite.com/redpanda/redpanda/builds/77471#019af120-4835-4b83-97dc-d4670e4190fb FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0515, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1466, p1=0.2050, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionTest&test_method=test_unavoidable_data_loss
test results on build#77540
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ShadowLinkingReplicationTests test_with_restart null integration https://buildkite.com/redpanda/redpanda/builds/77540#019affa1-1829-42a0-9adf-6d6add4d58be FLAKY 59/70 upstream reliability is '100.0'. current run reliability is '62.06896551724138'. drift is 37.93103 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
ShadowLinkingReplicationTests test_with_restart null integration https://buildkite.com/redpanda/redpanda/builds/77540#019affa1-182c-4c7a-aad5-67fb7f68ba40 FLAKY 62/70 upstream reliability is '100.0'. current run reliability is '68.0'. drift is 32.0 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart

vbotbuildovich avatar Nov 20 '25 00:11 vbotbuildovich

/ci-repeat 1 dt-repeat=50 tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_with_restart

bharathv avatar Nov 21 '25 03:11 bharathv

/ci-repeat 1 skip-redpanda-build skip-units dt-repeat=100 tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_with_restart

bharathv avatar Nov 21 '25 05:11 bharathv

/ci-repeat 1 dt-repeat=75 tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_with_restart

bharathv avatar Nov 24 '25 03:11 bharathv

added @nvartolomei whose been poking around kgo recently and might be interested.

dotnwat avatar Nov 27 '25 04:11 dotnwat

@bharathv I already account for "at least once" consume mode when consumer group restarts at https://github.com/redpanda-data/redpanda/blob/b576a0f7633f864d668eb55557459de4e0fd8761/tests/go/kgo-verifier/pkg/worker/verifier/group_read_worker.go#L197-L204

First, why is this not enough?

nvartolomei avatar Nov 27 '25 07:11 nvartolomei

@bharathv I already account for "at least once" consume mode when consumer group restarts at

https://github.com/redpanda-data/redpanda/blob/b576a0f7633f864d668eb55557459de4e0fd8761/tests/go/kgo-verifier/pkg/worker/verifier/group_read_worker.go#L197-L204

First, why is this not enough?

oh, I didn't know about this check. Now that I read it I don't understand how this is supposed to work. What err is consumerGroupReadInner expected to return if there was a duplicate read? Group rebalance is an async operation, so an example where this doesn't work is

t0 - fetched until offset=50 for p0 t1 - async offset committer - committed until offset 50 for p0 t2 - pollRecords() fetched until hwm (100) - waiting to fetch more (nothing to fetch) t3 - Rebalance ran async and all partition assigments are revoked (will block offset commits) t4 - Assigned again for p=0 at offset=50 (last committed) after syncGroup() t5 - pollFetches() returns again from offset 50 (.. panic on duplicate read)

Does it work in a case like this?

bharathv avatar Dec 01 '25 19:12 bharathv

@bharathv: What err is consumerGroupReadInner expected to return if there was a duplicate read?

It will panic rather than return.

@bharathv: Does it work in a case like this?

Thanks for detailed explanation. No, the current logic does not account for this. I didn't consider this case. Considered only the cases where a broker would die and next PollFetches returns an error which we propagate and then ResetMonotonicityOffsets for.

May I suggest that instead of the implementation in this PR we instead do something like resetting the "monotonicity validator state" for partition via franz-go callbacks like

https://github.com/twmb/franz-go/blob/1eb651c40a997fbbe35c7da19686c57cd65ba352/pkg/kgo/config.go#L1798-L1816

nvartolomei avatar Dec 02 '25 20:12 nvartolomei

@bharathv: What err is consumerGroupReadInner expected to return if there was a duplicate read?

It will panic rather than return.

@bharathv: Does it work in a case like this?

Thanks for detailed explanation. No, the current logic does not account for this. I didn't consider this case. Considered only the cases where a broker would die and next PollFetches returns an error which we propagate and then ResetMonotonicityOffsets for.

May I suggest that instead of the implementation in this PR we instead do something like resetting the "monotonicity validator state" for partition via franz-go callbacks like

https://github.com/twmb/franz-go/blob/1eb651c40a997fbbe35c7da19686c57cd65ba352/pkg/kgo/config.go#L1798-L1816

Ya I think thats a cleaner approach, I took a stab at it, will re-request reviews once the CI is green.

bharathv avatar Dec 05 '25 00:12 bharathv

Retry command for Build#77384

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkConsumeGroupsMirroringTest.test_consumer_groups_mirroring@{"source_cluster_spec":{"cluster_type":"redpanda"}}
tests/rptest/tests/consumer_group_recovery_tool_test.py::ConsumerOffsetsRecoveryToolTest.test_consumer_offsets_partition_count_change
tests/rptest/tests/consumer_group_recovery_test.py::ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery@{"force_offset_upload_failures":false}
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkConsumeGroupsMirroringTest.test_consumer_groups_mirroring@{"source_cluster_spec":{"cluster_type":"kafka","kafka_quorum":"COMBINED_KRAFT","kafka_version":"3.8.0"}}
tests/rptest/tests/consumer_group_recovery_test.py::ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery@{"force_offset_upload_failures":true}

vbotbuildovich avatar Dec 05 '25 07:12 vbotbuildovich

@nvartolomei ready for review

bharathv avatar Dec 06 '25 00:12 bharathv

Retry command for Build#77471

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/partition_move_interruption_test.py::PartitionMoveInterruption.test_cancellations_interrupted_with_restarts@{"replication_factor":3}
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_topic_delete@{"source_cluster_spec":{"cluster_type":"redpanda"}}
tests/rptest/tests/consumer_offsets_consistency_test.py::ConsumerOffsetsConsistencyTest.test_flipping_leadership
tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_topic_delete@{"source_cluster_spec":{"cluster_type":"kafka","kafka_quorum":"COMBINED_KRAFT","kafka_version":"3.8.0"}}

vbotbuildovich avatar Dec 06 '25 02:12 vbotbuildovich

/ci-repeat 1 skip-redpanda-build skip-units skip-rebase dt-repeat=50 tests/rptest/tests/cluster_linking_e2e_test.py::ShadowLinkingReplicationTests.test_with_restart

bharathv avatar Dec 08 '25 20:12 bharathv

This PR found a bunch of duplicate reads with the group consumer that earlier implementation didn’t catch. I’m starting to think if the default exactly_once expectation should be set to false. With chaos, leader changes restarts and everything else going on in tests, it’s pretty hard to avoid duplicate reads without transactions (which most tests don't).

bharathv avatar Dec 09 '25 00:12 bharathv