redpanda
redpanda copied to clipboard
Fix timequery returning wrong offset after trim-prefix which could lead to stuck consumers
The fix works only if compression is not in use. We need follow-up work which would decompress the batches to find the exact offset to return, or (!) we could to prevent trim-offset inside a batch in that case.
Backports Required
- [ ] none - not a bug fix
- [ ] none - this is a backport
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [x] v24.1.x
- [x] v23.3.x
- [ ] v23.2.x
Release Notes
Bug Fixes
- Fix a scenario where list_offset with a timestamp could return a lower offset than partition start after a trim-prefix command. This could lead to consumers being stuck with an out-of-range-offset exception if they began consuming from an offset below the one which was used in the trim-prefix command.
/dt
new failures in https://buildkite.com/redpanda/redpanda/builds/48359#018f1bfc-2f63-4411-a634-cf0b04ce2121:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=True.spillover=True"
new failures in https://buildkite.com/redpanda/redpanda/builds/48359#018f1bfc-2f6e-44ea-b952-03424f452652:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=True.spillover=False"
new failures in https://buildkite.com/redpanda/redpanda/builds/48359#018f1bfc-2f6a-4bb0-a615-ef80b9171807:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=False.spillover=False"
new failures in https://buildkite.com/redpanda/redpanda/builds/48359#018f1c03-5241-4440-8611-7211f0dc7557:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=True.spillover=True"
new failures in https://buildkite.com/redpanda/redpanda/builds/48359#018f1c03-523f-49ff-9ce6-c3d7891e976a:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=True.spillover=False"
new failures in https://buildkite.com/redpanda/redpanda/builds/48359#018f1c03-523c-4deb-b28c-f6da1f88bbb3:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=False.spillover=False"
new failures in https://buildkite.com/redpanda/redpanda/builds/48489#018f2fdb-6ae5-408d-a787-ec3ba9f51914:
"rptest.tests.timequery_test.TimeQueryTest.test_timequery_with_trim_prefix.cloud_storage=True.spillover=True"
new failures in https://buildkite.com/redpanda/redpanda/builds/48489#018f2fdb-6aed-42ff-95b3-6f79da4b9bc5:
"rptest.tests.cluster_config_test.ClusterConfigAliasTest.test_aliasing_with_upgrade.wipe_cache=False.prop_set=PropertyAliasData.primary_name=.cloud_storage_graceful_transfer_timeout_ms.aliased_name=.cloud_storage_graceful_transfer_timeout.redpanda_version=.23.2.test_values=.1234.1235.1236.expect_restart=False"
- Rebased on dev to resolve conflicts after a PR introduced by Willem to fix an unrelated timequery bug.
- Addressed reviewer's comments.
- Updated offset_range to bounded_offset_range to avoid misuse. It is less useful and more useful both at the same time!
Let's see what CI says.
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48489#018f2fe3-c4cc-4176-acb7-f136b4e36f1f
- improve commit message as requested by @andrwng
- add a fix for an edge case where cloud storage shouldn't be read at all https://github.com/redpanda-data/redpanda/pull/18112/commits/f906e2480a47194242286a606b46389479f896fa
- commented out trim prefix with tiered storage as they run into an (existing) edge case which will be addressed in another PR
- Fix off-by-one error in reader max offset
- Rename bounded_offset_range to bounded_offset_interval and redesign the API to make it easier to use correctly/harder to misuse
Last 2 force-pushes fixes some typos in text.
Merging this to unblock https://github.com/redpanda-data/redpanda/pull/18097. Will address comments as follow ups.
/backport v24.1.x
/backport v23.3.x
Failed to create a backport PR to v23.3.x branch. I tried:
git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-18112-v23.3.x-984 remotes/upstream/v23.3.x
git cherry-pick -x 99d2bec5f7ee765cb3de88b446278262f6dae84f d97d61fb8a9b06fbde9dcf7ff03799bf200b561b 4f87afa392201c493f24f21af3e1cd7f0727649f f13bfa6c490490487d9a926c9a5d4e441adc3ca6 76a1ea2452b09a5730f2574646fc06ab2b8b8e32 f9ed5cabe479b355d370bec6bc9b693ad2928f3c a40999d2a09e0586c3fa81521c4d5fb5d0abc9dc 8f2de964c0e915f4f10ae8eb74400e6288c5680f