redpanda
redpanda copied to clipboard
Fix timequery not returning results when racing with archival retention and gc
The meat of the PR:
Tiered Storage physically has a superset of the addressable data. This can be caused at least by the following: a) trim-prefix, b) retention being applied but garbage collection not finishing yet.
For offset queries this isn't problematic because the bounds can be applied at higher level. In particular, partition object does validate that offset is in range before passing control to the remote partition.
For timequeries prior to this commit such bounds were not enforced leading to a bug where cloud storage would return an offset -1 (no data found) in result when there actually was data or returning a wrong offset.
Wrong offset: it would be returned because reads could have started prior to the partition visible/addressable offset. E.g. after retention was applied but before GC was run. Or, after a trim-prefix with an offset which falls in a middle of a batch.
Missing offset: would be returned when the higher level reader was created with visible/addressable partition offset bounds, say [1000, 1200] but cloud storage would find the offset in a manifest with bounds [300, 400] leading to an out of range error which used to be ignored.
Fixes #15312
Backports Required
- [ ] none - not a bug fix
- [ ] none - this is a backport
- [ ] none - issue does not exist in previous branches
- [ ] none - papercut/not impactful enough to backport
- [x] v24.1.x
- [x] v23.3.x
- [ ] v23.2.x
Release Notes
Bug Fixes
- Fix an edge case where a timequery returns no results if it races with tiered storage retention and garbage collection. This is important at least for consumers that fall behind retention. They interpret such response as the partition is empty and jump to the HWM instead of resuming consuming from the first available message.
/dt
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48329#018f1b1c-8dba-41d7-954c-3c44be525930
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48358#018f1bd0-e072-46e4-a8cb-017eea7d72a3
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/48825#018f588d-d0a5-4228-b42b-80104b665155
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/49276#018f8613-739f-4646-a285-6990ca17adee
/dt
/dt
new failures in https://buildkite.com/redpanda/redpanda/builds/48825#018f588d-d0a5-4228-b42b-80104b665155:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"
new failures in https://buildkite.com/redpanda/redpanda/builds/48825#018f588d-d0a8-4d73-acb2-d953fa66d760:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
new failures in https://buildkite.com/redpanda/redpanda/builds/48825#018f587e-972f-4040-8407-643d3582f487:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"
new failures in https://buildkite.com/redpanda/redpanda/builds/48825#018f587e-972d-4422-b0fb-cf6d38dfd19f:
"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"
/dt
/dt
/backport v24.1.x
/backport v23.3.x
Failed to create a backport PR to v24.1.x branch. I tried:
git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-18097-v24.1.x-819 remotes/upstream/v24.1.x
git cherry-pick -x 4c706fb144441e1dfd1d030fda64cfd13cc8a9f5 aab5fe7db7c96d1623a1e56e5263f887a5016fb9 b53deac17c7f6bb2d3f1163965e4ef1397dafcac d1543eeb8c929eccb414da678445fa0981fca075 41eed623536a38cdfaf3f3bc193ffcef51a8fb71 c5eb52d80e7a4833ff03e14bb72502cb7c0373d8 680a67e5e644d5ddc65c6808bec3a50e0048dcda 3a9058ab1fc0c475e89a2408325e3c1b835c0ebd 0735bdfdabf8dde7b95303e9b9b28d8851d9eb9e 9846ed93e5468b7655c55e5c376fb21c03bf1bb8 5ae5fcd91b4c1ae3dfa0da8e9db0748a6c797e2b 943aa52273ae85a7dc37d55eff1ec768a9ef122e 1d7a1e3e2827441b0749a0e40097c4be9d65a3a7
Failed to create a backport PR to v23.3.x branch. I tried:
git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-18097-v23.3.x-981 remotes/upstream/v23.3.x
git cherry-pick -x 4c706fb144441e1dfd1d030fda64cfd13cc8a9f5 aab5fe7db7c96d1623a1e56e5263f887a5016fb9 b53deac17c7f6bb2d3f1163965e4ef1397dafcac d1543eeb8c929eccb414da678445fa0981fca075 41eed623536a38cdfaf3f3bc193ffcef51a8fb71 c5eb52d80e7a4833ff03e14bb72502cb7c0373d8 680a67e5e644d5ddc65c6808bec3a50e0048dcda 3a9058ab1fc0c475e89a2408325e3c1b835c0ebd 0735bdfdabf8dde7b95303e9b9b28d8851d9eb9e 9846ed93e5468b7655c55e5c376fb21c03bf1bb8 5ae5fcd91b4c1ae3dfa0da8e9db0748a6c797e2b 943aa52273ae85a7dc37d55eff1ec768a9ef122e 1d7a1e3e2827441b0749a0e40097c4be9d65a3a7