[CORE-13254] ct: calculate global gc eligible L0 epoch
Computes the top-level safe to delete L0 GC epoch value. There is a very simple ducktape test that verifies that deletes are occuring. I want to get some better testing into the ducktap setup for 25.3.x. I struggled a lot with figuring out how to get a fixture test working, so that's also on the todo list, but may not really be necessary for 25.3.x--Oren is going to take over getting this across the GA line.
Fixes: https://redpandadata.atlassian.net/browse/CORE-13254 Fixes: https://redpandadata.atlassian.net/browse/CORE-14824
Backports Required
- [ ] none - not a bug fix
- [ ] none - this is a backport
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [x] v25.3.x
- [ ] v25.2.x
- [ ] v25.1.x
- [ ] v24.3.x
Release Notes
- none
CI test results
test results on build#75616
| test_class | test_method | test_arguments | test_kind | job_url | test_status | passed | reason | test_history |
|---|---|---|---|---|---|---|---|---|
| ShadowLinkingReplicationTests | test_auto_prefix_trimming | {"source_cluster_spec": {"cluster_type": "redpanda"}, "with_failures": true} | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a51ce-6be0-47e6-aece-772084f8160d | FLAKY | 20/21 | upstream reliability is '92.46298788694482'. current run reliability is '95.23809523809523'. drift is -2.77511 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming |
| ConsumerGroupBalancingTest | test_coordinator_nodes_balance | null | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a520a-46e7-43b8-9832-ca8dc38fda11 | FLAKY | 20/21 | upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ConsumerGroupBalancingTest&test_method=test_coordinator_nodes_balance |
| MountUnmountIcebergTest | test_simple_remount | {"cloud_storage_type": 1} | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a520a-46e7-43b8-9832-ca8dc38fda11 | FLAKY | 17/21 | upstream reliability is '97.68421052631578'. current run reliability is '80.95238095238095'. drift is 16.73183 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=MountUnmountIcebergTest&test_method=test_simple_remount |
| SegmentMsTest | test_segment_rolling_with_retention_consumer | null | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a520a-46e0-4f42-95ac-5a05a7aad866 | FLAKY | 16/21 | upstream reliability is '94.73039215686273'. current run reliability is '76.19047619047619'. drift is 18.53992 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=SegmentMsTest&test_method=test_segment_rolling_with_retention_consumer |
| ShadowLinkingRandomOpsTest | test_node_operations | {"failures": false} | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a520a-46e5-469d-b45a-9f86c076b39e | FLAKY | 19/21 | upstream reliability is '99.69183359013869'. current run reliability is '90.47619047619048'. drift is 9.21564 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations |
| ShadowLinkingRandomOpsTest | test_node_operations | {"failures": true} | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a520a-46e7-43b8-9832-ca8dc38fda11 | FLAKY | 16/21 | upstream reliability is '100.0'. current run reliability is '76.19047619047619'. drift is 23.80952 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingRandomOpsTest&test_method=test_node_operations |
| TxUpgradeCompactionTest | upgrade_with_compaction_test | null | integration | https://buildkite.com/redpanda/redpanda/builds/75616#019a520a-46e4-43cb-92f6-79e8606ebb10 | FLAKY | 20/21 | upstream reliability is '99.3006993006993'. current run reliability is '95.23809523809523'. drift is 4.0626 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TxUpgradeCompactionTest&test_method=upgrade_with_compaction_test |
| src/v/storage/tests/storage_e2e_fixture_test | src/v/storage/tests/storage_e2e_fixture_test | unit | https://buildkite.com/redpanda/redpanda/builds/75616#019a51b3-efb8-46e8-beee-bb3fc6525973 | FAIL | 0/1 |
test results on build#77292
| test_class | test_method | test_arguments | test_kind | job_url | test_status | passed | reason | test_history |
|---|---|---|---|---|---|---|---|---|
| PartitionBalancerTest | test_unavailable_nodes | null | integration | https://buildkite.com/redpanda/redpanda/builds/77292#019ae6ba-f6f3-41e8-aef3-6bcc290b5bbd | FLAKY | 20/21 | upstream reliability is '100.0'. current run reliability is '95.23809523809523'. drift is 4.7619 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionBalancerTest&test_method=test_unavailable_nodes |
test results on build#77379
| test_class | test_method | test_arguments | test_kind | job_url | test_status | passed | reason | test_history |
|---|---|---|---|---|---|---|---|---|
| ShadowLinkingReplicationTests | test_topic_delete | {"source_cluster_spec": {"cluster_type": "redpanda"}} | integration | https://buildkite.com/redpanda/redpanda/builds/77379#019aecf0-a435-4228-bd79-047f44788f61 | FLAKY | 14/21 | upstream reliability is '100.0'. current run reliability is '66.66666666666666'. drift is 33.33333 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_topic_delete |
| NodesDecommissioningTest | test_decommissioning_rebalancing_node | {"shutdown_decommissioned": false} | integration | https://buildkite.com/redpanda/redpanda/builds/77379#019aecf0-a42f-4f66-97db-0524ba4827e7 | FLAKY | 19/21 | upstream reliability is '93.16239316239316'. current run reliability is '90.47619047619048'. drift is 2.6862 and the allowed drift is set to 50. The test should PASS | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommissioning_rebalancing_node |
test results on build#77607
| test_class | test_method | test_arguments | test_kind | job_url | test_status | passed | reason | test_history |
|---|---|---|---|---|---|---|---|---|
| ReplicatedMetastoreTest | TestBasicRemoveTopics | unit | https://buildkite.com/redpanda/redpanda/builds/77607#019b04f7-e94d-4b1e-9f15-4fa9daac43bd | FAIL | 0/1 | |||
| ReplicatedMetastoreTest | TestBasicRemoveTopics | unit | https://buildkite.com/redpanda/redpanda/builds/77607#019b056f-20b7-4045-a77f-c2662d2e9e1c | FAIL | 0/1 | |||
| ScalingUpTest | test_fast_node_addition | null | integration | https://buildkite.com/redpanda/redpanda/builds/77607#019b053b-7691-400f-a3da-8fd35afccb96 | FLAKY | 29/31 | Test PASSES after retries.No significant increase in flaky rate(baseline=0.0292, p0=0.5889, reject_threshold=0.0100. adj_baseline=0.0851, p1=0.2632, trust_threshold=0.5000) | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition |
test results on build#77631
| test_class | test_method | test_arguments | test_kind | job_url | test_status | passed | reason | test_history |
|---|---|---|---|---|---|---|---|---|
| NodesDecommissioningTest | test_decommissioning_rebalancing_node | {"shutdown_decommissioned": false} | integration | https://buildkite.com/redpanda/redpanda/builds/77631#019b067d-b219-4fcc-afc4-1e79670ba2b4 | FLAKY | 12/21 | Test FAILS after retries.Significant increase in flaky rate(baseline=0.1046, p0=0.0006, reject_threshold=0.0100) | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodesDecommissioningTest&test_method=test_decommissioning_rebalancing_node |
| WriteCachingFailureInjectionTest | test_unavoidable_data_loss | null | integration | https://buildkite.com/redpanda/redpanda/builds/77631#019b067d-b216-4018-a739-ea109bb608db | FLAKY | 18/21 | Test PASSES after retries.No significant increase in flaky rate(baseline=0.0549, p0=0.3012, reject_threshold=0.0100. adj_baseline=0.1558, p1=0.3771, trust_threshold=0.5000) | https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionTest&test_method=test_unavoidable_data_loss |
Retry command for Build#77631
please wait until all jobs are finished before running the slash command
/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/nodes_decommissioning_test.py::NodesDecommissioningTest.test_decommissioning_rebalancing_node@{"shutdown_decommissioned":false}
/backport v25.3.x
@Lazin
I think we need to check if the snapshot that we created by looking at topic table is actually consistent. The controller offset of the snapshot is the upper bound for the GC epoch. Do we enforce this invariant somewhere?
The belief is that it is consistent because updates to the topics table are themselves commands in the controller log. But I agree, we should have more constraints. One simple thing to do is sample before and after taking the snapshot and makes ure it didn't change. Kinda like a sequence lock?