redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Data Migrations: allow concurrent group migrations

Open bashtanov opened this issue 1 month ago • 2 comments

Currently migrations involving consumer groups cannot run concurrently even if they affect only distinct groups. Allow this.

This involves some changes in backend and worker data structures. Previously it was assumed a topic or a partition is only involved in no more than one migration, but a consumer group topic partition may be touched by multiple migrations.

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [x] v25.3.x
  • [ ] v25.2.x
  • [ ] v25.1.x
  • [ ] v24.3.x

Release Notes

  • none

bashtanov avatar Dec 04 '25 00:12 bashtanov

CI test results

test results on build#77294
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
MountUnmountIcebergTest test_simple_remount {"cloud_storage_type": 1} integration https://buildkite.com/redpanda/redpanda/builds/77294#019ae6f8-89a7-4b2a-8a70-8d6bf33b3214 FLAKY 13/21 upstream reliability is '78.61271676300578'. current run reliability is '61.904761904761905'. drift is 16.70795 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount
PartitionForceReconfigurationTest test_basic_reconfiguration {"acks": 1, "controller_snapshots": true, "restart": true} integration https://buildkite.com/redpanda/redpanda/builds/77294#019ae6ea-0dff-46ac-bb12-a8dcd795f2a5 FLAKY 20/21 upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionForceReconfigurationTest&test_method=test_basic_reconfiguration
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/77294#019ae6f8-89a6-461d-a215-01cf952c5f3c FLAKY 15/21 upstream reliability is '89.24843423799582'. current run reliability is '71.42857142857143'. drift is 17.81986 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#77763
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
FeaturesMultiNodeUpgradeTest test_rollback null integration https://buildkite.com/redpanda/redpanda/builds/77763#019b1206-c15b-4803-98c8-952f12e93b4c FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.4097, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesMultiNodeUpgradeTest&test_method=test_rollback
FeaturesMultiNodeUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/77763#019b1206-c15c-4edb-8fec-664c1010becb FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.4089, p0=0.0001, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesMultiNodeUpgradeTest&test_method=test_upgrade
FeaturesSingleNodeUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/77763#019b1206-c156-44b0-b840-02bd49a12221 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.4223, p0=0.0002, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=FeaturesSingleNodeUpgradeTest&test_method=test_upgrade
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": false, "clean_node_before_recovery": false} integration https://buildkite.com/redpanda/redpanda/builds/77763#019b1206-c15c-4edb-8fec-664c1010becb FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.4149, p0=0.0002, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": false, "clean_node_before_recovery": true} integration https://buildkite.com/redpanda/redpanda/builds/77763#019b1206-c15f-4370-9ab0-f98d6d63b022 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.4155, p0=0.0002, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
LicenseEnforcementTest test_license_enforcement {"clean_node_after_recovery": true, "clean_node_before_recovery": true} integration https://buildkite.com/redpanda/redpanda/builds/77763#019b1206-c160-4cc2-8940-13bd1297be39 FAIL 0/11 Test FAILS after retries.Significant increase in flaky rate(baseline=0.4151, p0=0.0002, reject_threshold=0.0100) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=LicenseEnforcementTest&test_method=test_license_enforcement
test results on build#77766
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
AuditLogTestSchemaRegistryACLs test_sr_audit_authz {"audit_transport_mode": "kclient", "endpoint_name": "PUT_MODE"} integration https://buildkite.com/redpanda/redpanda/builds/77766#019b1284-e768-43a2-835e-981b2ca0a707 FLAKY 50/51 Test PASSES after retries.Inconclusive result after max retries(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.0000, p1=1.0000, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AuditLogTestSchemaRegistryACLs&test_method=test_sr_audit_authz
PartitionReassignmentsTest test_reassignments_cancel null integration https://buildkite.com/redpanda/redpanda/builds/77766#019b1284-e769-42e1-885b-242adfc7d686 FLAKY 8/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.1166, p0=0.3285, reject_threshold=0.0100. adj_baseline=0.3106, p1=0.3551, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionReassignmentsTest&test_method=test_reassignments_cancel

vbotbuildovich avatar Dec 04 '25 03:12 vbotbuildovich

Can you add some high level overview to the commit comments?

joe-redpanda avatar Dec 09 '25 20:12 joe-redpanda

Retry command for Build#77763

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_features_test.py::FeaturesSingleNodeUpgradeTest.test_upgrade
tests/rptest/tests/cluster_features_test.py::FeaturesMultiNodeUpgradeTest.test_rollback
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":false,"clean_node_before_recovery":true}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":true,"clean_node_before_recovery":true}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":false,"clean_node_before_recovery":false}
tests/rptest/tests/cluster_features_test.py::FeaturesMultiNodeUpgradeTest.test_upgrade

vbotbuildovich avatar Dec 12 '25 11:12 vbotbuildovich

/ci-repeat 1

bashtanov avatar Dec 12 '25 11:12 bashtanov