redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (Internal object storage scrub detected fatal anomalies) in `ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy`

Open andijcr opened this issue 1 year ago • 14 comments

https://buildkite.com/redpanda/vtools/builds/10650 https://buildkite.com/redpanda/vtools/builds/10846

Module: rptest.tests.e2e_shadow_indexing_test
Class: ShadowIndexingWhileBusyTest
Method: test_create_or_delete_topics_while_busy
Arguments: {
    "short_retention": true,
    "cloud_storage_type": 1
}
test_id:    ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy
status:     FAIL
run time:   789.483 seconds

RuntimeError("Internal object storage scrub detected fatal anomalies: [{'ns': 'kafka', 'topic': 'topic-zmilrsrinp', 'partition': 36, 'revision_id': 51, 'missing_segments': ['22e19e71/kafka/topic-zmilrsrinp/36_51/5185-5226-20977506-1-v1.log.1', '36b4b4a9/kafka/topic-zmilrsrinp/36_51/5145-5184-20977360-1-v1.log.1', '2526f118/kafka/topic-zmilrsrinp/36_51/4945-4984-20977360-1-v1.log.1'], 'last_complete_scrub_at': 1700085196023}, {'ns': 'kafka', 'topic': 'topic-zmilrsrinp', 'partition': 12, 'revision_id': 51, 'missing_segments': ['f3d2be80/kafka/topic-zmilrsrinp/12_51/2896-2937-20977639-1-v1.log.1', '65c66587/kafka/topic-zmilrsrinp/12_51/3104-3147-20977918-1-v1.log.1', 'db7677ea/kafka/topic-zmilrsrinp/12_51/3778-3819-20977639-1-v1.log.1', 'ee6d0524/kafka/topic-zmilrsrinp/12_51/4617-4656-20977360-1-v1.log.1', '2ec0f06f/kafka/topic-zmilrsrinp/12_51/3694-3735-20977639-1-v1.log.1', 'eae7d342/kafka/topic-zmilrsrinp/12_51/1131-1174-20977918-1-v1.log.1', 'f5d03a8f/kafka/topic-zmilrsrinp/12_51/84-125-20977637-1-v1.log.1', 'a976720f/kafka/topic-zmilrsrinp/12_51/1469-1510-20977639-1-v1.log.1', 'cc3e11ca/kafka/topic-zmilrsrinp/12_51/1677-1718-20977639-1-v1.log.1', '76567a90/kafka/topic-zmilrsrinp/12_51/1931-1972-20977639-1-v1.log.1', '224cbf80/kafka/topic-zmilrsrinp/12_51/3610-3651-20977639-1-v1.log.1', 'b71440c6/kafka/topic-zmilrsrinp/12_51/168-209-20977638-1-v1.log.1', 'be8707b2/kafka/topic-zmilrsrinp/12_51/5189-5228-20977360-1-v1.log.1', 'de7fd964/kafka/topic-zmilrsrinp/12_51/4657-4700-20977918-1-v1.log.1', 'd58bb90a/kafka/topic-zmilrsrinp/12_51/2141-2182-20977639-1-v1.log.1', '1871d1ac/kafka/topic-zmilrsrinp/12_51/2225-2266-20977639-1-v1.log.1', 'a6252834/kafka/topic-zmilrsrinp/12_51/2477-2518-20977639-1-v1.log.1', 'b7a7001d/kafka/topic-zmilrsrinp/12_51/755-796-20977638-1-v1.log.1', '10a2a9c3/kafka/topic-zmilrsrinp/12_51/4869-4908-20977360-1-v1.log.1', '90725443/kafka/topic-zmilrsrinp/12_51/1385-1426-20977639-1-v1.log.1', '8090ce86/kafka/topic-zmilrsrinp/12_51/4155-4196-20977639-1-v1.log.1', 'bbc81a72/kafka/topic-zmilrsrinp/12_51/921-964-20977916-1-v1.log.1', 'bdc82106/kafka/topic-zmilrsrinp/12_51/2938-2979-20977639-1-v1.log.1', '42d4d859/kafka/topic-zmilrsrinp/12_51/5149-5188-20977360-1-v1.log.1', '9809514d/kafka/topic-zmilrsrinp/12_51/3400-3441-20977639-1-v1.log.1', '2348add4/kafka/topic-zmilrsrinp/12_51/5069-5108-20977360-1-v1.log.1', '435ff1fb/kafka/topic-zmilrsrinp/12_51/4239-4280-20977639-1-v1.log.1', '1d16b05d/kafka/topic-zmilrsrinp/12_51/2686-2727-20977639-1-v1.log.1', 'ce7b3ccc/kafka/topic-zmilrsrinp/12_51/4071-4112-20977639-1-v1.log.1', '06e502f3/kafka/topic-zmilrsrinp/12_51/419-460-20977638-1-v1.log.1', '3904e663/kafka/topic-zmilrsrinp/12_51/1427-1468-20977639-1-v1.log.1', '90d342ae/kafka/topic-zmilrsrinp/12_51/4029-4070-20977639-1-v1.log.1', '960cc049/kafka/topic-zmilrsrinp/12_51/545-586-20977638-1-v1.log.1', 'bc059ce3/kafka/topic-zmilrsrinp/12_51/3232-3272-20977562-1-v1.log.1', '4f3cad8a/kafka/topic-zmilrsrinp/12_51/4909-4948-20977360-1-v1.log.1', '6c0f4d7a/kafka/topic-zmilrsrinp/12_51/965-1004-20977360-1-v1.log.1', '20ac88bf/kafka/topic-zmilrsrinp/12_51/1343-1384-20977639-1-v1.log.1', '41adbf3b/kafka/topic-zmilrsrinp/12_51/2183-2224-20977639-1-v1.log.1', '8e47f47e/kafka/topic-zmilrsrinp/12_51/4742-4784-20977716-1-v1.log.1', 'cee530ee/kafka/topic-zmilrsrinp/12_51/3148-3189-20977639-1-v1.log.1', 'bbf8b90e/kafka/topic-zmilrsrinp/12_51/332-376-20978056-1-v1.log.1', '61ee93ad/kafka/topic-zmilrsrinp/12_51/1889-1930-20977639-1-v1.log.1', 'fbc3c6e0/kafka/topic-zmilrsrinp/12_51/671-712-20977638-1-v1.log.1', '86127870/kafka/topic-zmilrsrinp/12_51/2770-2811-20977639-1-v1.log.1', 'c8daa35c/kafka/topic-zmilrsrinp/12_51/2267-2308-20977639-1-v1.log.1', 'f294929e/kafka/topic-zmilrsrinp/12_51/3358-3399-20977639-1-v1.log.1', '103d2365/kafka/topic-zmilrsrinp/12_51/1089-1130-20977639-1-v1.log.1', '854f1683/kafka/topic-zmilrsrinp/12_51/1763-1802-20977360-1-v1.log.1', 'faffe22f/kafka/topic-zmilrsrinp/12_51/4197-4238-20977639-1-v1.log.1', '676b902a/kafka/topic-zmilrsrinp/12_51/2728-2769-20977639-1-v1.log.1', '87fa5103/kafka/topic-zmilrsrinp/12_51/3820-3859-20977360-1-v1.log.1', '07f8253d/kafka/topic-zmilrsrinp/12_51/879-920-20977638-1-v1.log.1', '599d62e5/kafka/topic-zmilrsrinp/12_51/4785-4826-20977639-1-v1.log.1', 'e3ef074b/kafka/topic-zmilrsrinp/12_51/2519-2560-20977639-1-v1.log.1', 'e2ab6da8/kafka/topic-zmilrsrinp/12_51/4989-5028-20977360-1-v1.log.1', '97abc74f/kafka/topic-zmilrsrinp/12_51/210-251-20977638-1-v1.log.1', '667482f9/kafka/topic-zmilrsrinp/12_51/501-544-20977916-1-v1.log.1', 'ae5f95a8/kafka/topic-zmilrsrinp/12_51/4575-4616-20977639-1-v1.log.1', 'ea8d3d36/kafka/topic-zmilrsrinp/12_51/126-167-20977637-1-v1.log.1', 'ad77531c/kafka/topic-zmilrsrinp/12_51/377-418-20977638-1-v1.log.1', '982be0f8/kafka/topic-zmilrsrinp/12_51/3652-3693-20977639-1-v1.log.1', '1d95ce80/kafka/topic-zmilrsrinp/12_51/292-331-20977360-1-v1.log.1', '83cd2c3a/kafka/topic-zmilrsrinp/12_51/4701-4741-20977562-1-v1.log.1', 'acf5d5b4/kafka/topic-zmilrsrinp/12_51/3736-3777-20977639-1-v1.log.1', '2d65cc47/kafka/topic-zmilrsrinp/12_51/252-291-20977360-1-v1.log.1', '0d23e49c/kafka/topic-zmilrsrinp/12_51/1049-1088-20977360-1-v1.log.1', 'd6b44ae8/kafka/topic-zmilrsrinp/12_51/1803-1846-20977918-1-v1.log.1', '1257b97e/kafka/topic-zmilrsrinp/12_51/2601-2643-20977780-1-v1.log.1', '2b03c1b8/kafka/topic-zmilrsrinp/12_51/2435-2476-20977639-1-v1.log.1', '38d22c9d/kafka/topic-zmilrsrinp/12_51/797-838-20977638-1-v1.log.1', '14d29aeb/kafka/topic-zmilrsrinp/12_51/2980-3019-20977360-1-v1.log.1', '887aa9e2/kafka/topic-zmilrsrinp/12_51/3568-3609-20977639-1-v1.log.1', '63ea855a/kafka/topic-zmilrsrinp/12_51/0-41-20977591-1-v1.log.1', '2b51ff73/kafka/topic-zmilrsrinp/12_51/1553-1594-20977639-1-v1.log.1', '893fbe03/kafka/topic-zmilrsrinp/12_51/587-628-20977638-1-v1.log.1', '575847a3/kafka/topic-zmilrsrinp/12_51/461-500-20977360-1-v1.log.1', '3d07166f/kafka/topic-zmilrsrinp/12_51/1719-1762-20977918-1-v1.log.1', 'd082fbca/kafka/topic-zmilrsrinp/12_51/3526-3567-20977639-1-v1.log.1', '29aa36a0/kafka/topic-zmilrsrinp/12_51/1847-1888-20977639-1-v1.log.1', '46a527cf/kafka/topic-zmilrsrinp/12_51/1175-1216-20977639-1-v1.log.1', 'c9e3f156/kafka/topic-zmilrsrinp/12_51/3316-3357-20977639-1-v1.log.1', '40840bfa/kafka/topic-zmilrsrinp/12_51/4281-4322-20977639-1-v1.log.1', '9dda12a4/kafka/topic-zmilrsrinp/12_51/1511-1552-20977639-1-v1.log.1', '037912f0/kafka/topic-zmilrsrinp/12_51/3273-3315-20977716-1-v1.log.1', '32a85d88/kafka/topic-zmilrsrinp/12_51/4448-4490-20977655-1-v1.log.1', '953cac9f/kafka/topic-zmilrsrinp/12_51/3860-3901-20977639-1-v1.log.1', '04c1d40e/kafka/topic-zmilrsrinp/12_51/4827-4868-20977645-1-v1.log.1', 'bfed2bb0/kafka/topic-zmilrsrinp/12_51/2644-2685-20977639-1-v1.log.1', 'd88bdb45/kafka/topic-zmilrsrinp/12_51/3987-4028-20977639-1-v1.log.1', '24b6668b/kafka/topic-zmilrsrinp/12_51/4533-4574-20977639-1-v1.log.1', 'c03786bd/kafka/topic-zmilrsrinp/12_51/2812-2853-20977639-1-v1.log.1', '67566786/kafka/topic-zmilrsrinp/12_51/2057-2098-20977639-1-v1.log.1', '8345c737/kafka/topic-zmilrsrinp/12_51/2015-2056-20977639-1-v1.log.1', '45ffb451/kafka/topic-zmilrsrinp/12_51/42-83-20977636-1-v1.log.1', '6a9477c8/kafka/topic-zmilrsrinp/12_51/2351-2392-20977639-1-v1.log.1', 'f8b00efd/kafka/topic-zmilrsrinp/12_51/629-670-20977638-1-v1.log.1', 'db00cb10/kafka/topic-zmilrsrinp/12_51/3484-3525-20977639-1-v1.log.1', '8f7ccf6e/kafka/topic-zmilrsrinp/12_51/4113-4154-20977639-1-v1.log.1', '94350b98/kafka/topic-zmilrsrinp/12_51/2099-2140-20977639-1-v1.log.1', 'ad1e3da7/kafka/topic-zmilrsrinp/12_51/3064-3103-20977360-1-v1.log.1', '37175c4d/kafka/topic-zmilrsrinp/12_51/2854-2895-20977639-1-v1.log.1', 'eb4cf7de/kafka/topic-zmilrsrinp/12_51/3945-3986-20977639-1-v1.log.1', '09db2d4a/kafka/topic-zmilrsrinp/12_51/4491-4532-20977639-1-v1.log.1', '330a0846/kafka/topic-zmilrsrinp/12_51/2393-2434-20977639-1-v1.log.1', '7fa89a5d/kafka/topic-zmilrsrinp/12_51/5029-5068-20977360-1-v1.log.1', '6deec357/kafka/topic-zmilrsrinp/12_51/2561-2600-20977360-1-v1.log.1', 'c72193af/kafka/topic-zmilrsrinp/12_51/1005-1048-20977916-1-v1.log.1', '8614f857/kafka/topic-zmilrsrinp/12_51/2309-2350-20977639-1-v1.log.1', '26705615/kafka/topic-zmilrsrinp/12_51/3902-3944-20977841-1-v1.log.1', '1d657d6c/kafka/topic-zmilrsrinp/12_51/5229-5268-20977360-1-v1.log.1', '3a0a21de/kafka/topic-zmilrsrinp/12_51/1301-1342-20977639-1-v1.log.1', 'c3d9107a/kafka/topic-zmilrsrinp/12_51/4365-4404-20977360-1-v1.log.1', 'a5aa7ba0/kafka/topic-zmilrsrinp/12_51/3442-3483-20977639-1-v1.log.1', '50d95089/kafka/topic-zmilrsrinp/12_51/1973-2014-20977639-1-v1.log.1', '5987db7d/kafka/topic-zmilrsrinp/12_51/4323-4364-20977639-1-v1.log.1', '9fd7ca1c/kafka/topic-zmilrsrinp/12_51/4949-4988-20977360-1-v1.log.1', 'f82f2a9a/kafka/topic-zmilrsrinp/12_51/839-878-20977360-1-v1.log.1', 'df34d9ae/kafka/topic-zmilrsrinp/12_51/4405-4447-20977841-1-v1.log.1', '81cfe030/kafka/topic-zmilrsrinp/12_51/713-754-20977638-1-v1.log.1', '2babec87/kafka/topic-zmilrsrinp/12_51/1259-1300-20977639-1-v1.log.1', '8a47cf0f/kafka/topic-zmilrsrinp/12_51/1637-1676-20977360-1-v1.log.1', '55aa25ec/kafka/topic-zmilrsrinp/12_51/1217-1258-20977639-1-v1.log.1', 'f3451177/kafka/topic-zmilrsrinp/12_51/1595-1636-20977639-1-v1.log.1', 'fb7c057b/kafka/topic-zmilrsrinp/12_51/3020-3063-20977918-1-v1.log.1'], 'last_complete_scrub_at': 1700085220006}]")
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 159, in wrapped
    self.redpanda.maybe_do_internal_scrub()
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 3894, in maybe_do_internal_scrub
    raise RuntimeError(
RuntimeError: Internal object storage scrub detected fatal anomalies: [{'ns': 'kafka', 'topic': 'topic-zmilrsrinp', 'partition': 36, 'revision_id': 51, 'missing_segments': ['22e19e71/kafka/topic-zmilrsrinp/36_51/5185-5226-20977506-1-v1.log.1', '36b4b4a9/kafka/topic-zmilrsrinp/36_51/5145-5184-20977360-1-v1.log.1', '2526f118/kafka/topic-zmilrsrinp/36_51/4945-4984-20977360-1-v1.log.1'], 'last_complete_scrub_at': 1700085196023}, {'ns': 'kafka', 'topic': 'topic-zmilrsrinp', 'partition': 12, 'revision_id': 51, 'missing_segments': ['f3d2be80/kafka/topic-zmilrsrinp/12_51/2896-2937-20977639-1-v1.log.1', '65c66587/kafka/topic-zmilrsrinp/12_51/3104-3147-20977918-1-v1.log.1', 'db7677ea/kafka/topic-zmilrsrinp/12_51/3778-3819-20977639-1-v1.log.1', 'ee6d0524/kafka/topic-zmilrsrinp/12_51/4617-4656-20977360-1-v1.log.1', '2ec0f06f/kafka/topic-zmilrsrinp/12_51/3694-3735-20977639-1-v1.log.1', 'eae7d342/kafka/topic-zmilrsrinp/12_51/1131-1174-20977918-1-v1.log.1', 'f5d03a8f/kafka/topic-zmilrsrinp/12_51/84-125-20977637-1-v1.log.1', 'a976720f/kafka/topic-zmilrsrinp/12_51/1469-1510-20977639-1-v1.log.1', 'cc3e11ca/kafka/topic-zmilrsrinp/12_51/1677-1718-20977639-1-v1.log.1', '76567a90/kafka/topic-zmilrsrinp/12_51/1931-1972-20977639-1-v1.log.1', '224cbf80/kafka/topic-zmilrsrinp/12_51/3610-3651-20977639-1-v1.log.1', 'b71440c6/kafka/topic-zmilrsrinp/12_51/168-209-20977638-1-v1.log.1', 'be8707b2/kafka/topic-zmilrsrinp/12_51/5189-5228-20977360-1-v1.log.1', 'de7fd964/kafka/topic-zmilrsrinp/12_51/4657-4700-20977918-1-v1.log.1', 'd58bb90a/kafka/topic-zmilrsrinp/12_51/2141-2182-20977639-1-v1.log.1', '1871d1ac/kafka/topic-zmilrsrinp/12_51/2225-2266-20977639-1-v1.log.1', 'a6252834/kafka/topic-zmilrsrinp/12_51/2477-2518-20977639-1-v1.log.1', 'b7a7001d/kafka/topic-zmilrsrinp/12_51/755-796-20977638-1-v1.log.1', '10a2a9c3/kafka/topic-zmilrsrinp/12_51/4869-4908-20977360-1-v1.log.1', '90725443/kafka/topic-zmilrsrinp/12_51/1385-1426-20977639-1-v1.log.1', '8090ce86/kafka/topic-zmilrsrinp/12_51/4155-4196-20977639-1-v1.log.1', 'bbc81a72/kafka/topic-zmilrsrinp/12_51/921-964-20977916-1-v1.log.1', 'bdc82106/kafka/topic-zmilrsrinp/12_51/2938-2979-20977639-1-v1.log.1', '42d4d859/kafka/topic-zmilrsrinp/12_51/5149-5188-20977360-1-v1.log.1', '9809514d/kafka/topic-zmilrsrinp/12_51/3400-3441-20977639-1-v1.log.1', '2348add4/kafka/topic-zmilrsrinp/12_51/5069-5108-20977360-1-v1.log.1', '435ff1fb/kafka/topic-zmilrsrinp/12_51/4239-4280-20977639-1-v1.log.1', '1d16b05d/kafka/topic-zmilrsrinp/12_51/2686-2727-20977639-1-v1.log.1', 'ce7b3ccc/kafka/topic-zmilrsrinp/12_51/4071-4112-20977639-1-v1.log.1', '06e502f3/kafka/topic-zmilrsrinp/12_51/419-460-20977638-1-v1.log.1', '3904e663/kafka/topic-zmilrsrinp/12_51/1427-1468-20977639-1-v1.log.1', '90d342ae/kafka/topic-zmilrsrinp/12_51/4029-4070-20977639-1-v1.log.1', '960cc049/kafka/topic-zmilrsrinp/12_51/545-586-20977638-1-v1.log.1', 'bc059ce3/kafka/topic-zmilrsrinp/12_51/3232-3272-20977562-1-v1.log.1', '4f3cad8a/kafka/topic-zmilrsrinp/12_51/4909-4948-20977360-1-v1.log.1', '6c0f4d7a/kafka/topic-zmilrsrinp/12_51/965-1004-20977360-1-v1.log.1', '20ac88bf/kafka/topic-zmilrsrinp/12_51/1343-1384-20977639-1-v1.log.1', '41adbf3b/kafka/topic-zmilrsrinp/12_51/2183-2224-20977639-1-v1.log.1', '8e47f47e/kafka/topic-zmilrsrinp/12_51/4742-4784-20977716-1-v1.log.1', 'cee530ee/kafka/topic-zmilrsrinp/12_51/3148-3189-20977639-1-v1.log.1', 'bbf8b90e/kafka/topic-zmilrsrinp/12_51/332-376-20978056-1-v1.log.1', '61ee93ad/kafka/topic-zmilrsrinp/12_51/1889-1930-20977639-1-v1.log.1', 'fbc3c6e0/kafka/topic-zmilrsrinp/12_51/671-712-20977638-1-v1.log.1', '86127870/kafka/topic-zmilrsrinp/12_51/2770-2811-20977639-1-v1.log.1', 'c8daa35c/kafka/topic-zmilrsrinp/12_51/2267-2308-20977639-1-v1.log.1', 'f294929e/kafka/topic-zmilrsrinp/12_51/3358-3399-20977639-1-v1.log.1', '103d2365/kafka/topic-zmilrsrinp/12_51/1089-1130-20977639-1-v1.log.1', '854f1683/kafka/topic-zmilrsrinp/12_51/1763-1802-20977360-1-v1.log.1', 'faffe22f/kafka/topic-zmilrsrinp/12_51/4197-4238-20977639-1-v1.log.1', '676b902a/kafka/topic-zmilrsrinp/12_51/2728-2769-20977639-1-v1.log.1', '87fa5103/kafka/topic-zmilrsrinp/12_51/3820-3859-20977360-1-v1.log.1', '07f8253d/kafka/topic-zmilrsrinp/12_51/879-920-20977638-1-v1.log.1', '599d62e5/kafka/topic-zmilrsrinp/12_51/4785-4826-20977639-1-v1.log.1', 'e3ef074b/kafka/topic-zmilrsrinp/12_51/2519-2560-20977639-1-v1.log.1', 'e2ab6da8/kafka/topic-zmilrsrinp/12_51/4989-5028-20977360-1-v1.log.1', '97abc74f/kafka/topic-zmilrsrinp/12_51/210-251-20977638-1-v1.log.1', '667482f9/kafka/topic-zmilrsrinp/12_51/501-544-20977916-1-v1.log.1', 'ae5f95a8/kafka/topic-zmilrsrinp/12_51/4575-4616-20977639-1-v1.log.1', 'ea8d3d36/kafka/topic-zmilrsrinp/12_51/126-167-20977637-1-v1.log.1', 'ad77531c/kafka/topic-zmilrsrinp/12_51/377-418-20977638-1-v1.log.1', '982be0f8/kafka/topic-zmilrsrinp/12_51/3652-3693-20977639-1-v1.log.1', '1d95ce80/kafka/topic-zmilrsrinp/12_51/292-331-20977360-1-v1.log.1', '83cd2c3a/kafka/topic-zmilrsrinp/12_51/4701-4741-20977562-1-v1.log.1', 'acf5d5b4/kafka/topic-zmilrsrinp/12_51/3736-3777-20977639-1-v1.log.1', '2d65cc47/kafka/topic-zmilrsrinp/12_51/252-291-20977360-1-v1.log.1', '0d23e49c/kafka/topic-zmilrsrinp/12_51/1049-1088-20977360-1-v1.log.1', 'd6b44ae8/kafka/topic-zmilrsrinp/12_51/1803-1846-20977918-1-v1.log.1', '1257b97e/kafka/topic-zmilrsrinp/12_51/2601-2643-20977780-1-v1.log.1', '2b03c1b8/kafka/topic-zmilrsrinp/12_51/2435-2476-20977639-1-v1.log.1', '38d22c9d/kafka/topic-zmilrsrinp/12_51/797-838-20977638-1-v1.log.1', '14d29aeb/kafka/topic-zmilrsrinp/12_51/2980-3019-20977360-1-v1.log.1', '887aa9e2/kafka/topic-zmilrsrinp/12_51/3568-3609-20977639-1-v1.log.1', '63ea855a/kafka/topic-zmilrsrinp/12_51/0-41-20977591-1-v1.log.1', '2b51ff73/kafka/topic-zmilrsrinp/12_51/1553-1594-20977639-1-v1.log.1', '893fbe03/kafka/topic-zmilrsrinp/12_51/587-628-20977638-1-v1.log.1', '575847a3/kafka/topic-zmilrsrinp/12_51/461-500-20977360-1-v1.log.1', '3d07166f/kafka/topic-zmilrsrinp/12_51/1719-1762-20977918-1-v1.log.1', 'd082fbca/kafka/topic-zmilrsrinp/12_51/3526-3567-20977639-1-v1.log.1', '29aa36a0/kafka/topic-zmilrsrinp/12_51/1847-1888-20977639-1-v1.log.1', '46a527cf/kafka/topic-zmilrsrinp/12_51/1175-1216-20977639-1-v1.log.1', 'c9e3f156/kafka/topic-zmilrsrinp/12_51/3316-3357-20977639-1-v1.log.1', '40840bfa/kafka/topic-zmilrsrinp/12_51/4281-4322-20977639-1-v1.log.1', '9dda12a4/kafka/topic-zmilrsrinp/12_51/1511-1552-20977639-1-v1.log.1', '037912f0/kafka/topic-zmilrsrinp/12_51/3273-3315-20977716-1-v1.log.1', '32a85d88/kafka/topic-zmilrsrinp/12_51/4448-4490-20977655-1-v1.log.1', '953cac9f/kafka/topic-zmilrsrinp/12_51/3860-3901-20977639-1-v1.log.1', '04c1d40e/kafka/topic-zmilrsrinp/12_51/4827-4868-20977645-1-v1.log.1', 'bfed2bb0/kafka/topic-zmilrsrinp/12_51/2644-2685-20977639-1-v1.log.1', 'd88bdb45/kafka/topic-zmilrsrinp/12_51/3987-4028-20977639-1-v1.log.1', '24b6668b/kafka/topic-zmilrsrinp/12_51/4533-4574-20977639-1-v1.log.1', 'c03786bd/kafka/topic-zmilrsrinp/12_51/2812-2853-20977639-1-v1.log.1', '67566786/kafka/topic-zmilrsrinp/12_51/2057-2098-20977639-1-v1.log.1', '8345c737/kafka/topic-zmilrsrinp/12_51/2015-2056-20977639-1-v1.log.1', '45ffb451/kafka/topic-zmilrsrinp/12_51/42-83-20977636-1-v1.log.1', '6a9477c8/kafka/topic-zmilrsrinp/12_51/2351-2392-20977639-1-v1.log.1', 'f8b00efd/kafka/topic-zmilrsrinp/12_51/629-670-20977638-1-v1.log.1', 'db00cb10/kafka/topic-zmilrsrinp/12_51/3484-3525-20977639-1-v1.log.1', '8f7ccf6e/kafka/topic-zmilrsrinp/12_51/4113-4154-20977639-1-v1.log.1', '94350b98/kafka/topic-zmilrsrinp/12_51/2099-2140-20977639-1-v1.log.1', 'ad1e3da7/kafka/topic-zmilrsrinp/12_51/3064-3103-20977360-1-v1.log.1', '37175c4d/kafka/topic-zmilrsrinp/12_51/2854-2895-20977639-1-v1.log.1', 'eb4cf7de/kafka/topic-zmilrsrinp/12_51/3945-3986-20977639-1-v1.log.1', '09db2d4a/kafka/topic-zmilrsrinp/12_51/4491-4532-20977639-1-v1.log.1', '330a0846/kafka/topic-zmilrsrinp/12_51/2393-2434-20977639-1-v1.log.1', '7fa89a5d/kafka/topic-zmilrsrinp/12_51/5029-5068-20977360-1-v1.log.1', '6deec357/kafka/topic-zmilrsrinp/12_51/2561-2600-20977360-1-v1.log.1', 'c72193af/kafka/topic-zmilrsrinp/12_51/1005-1048-20977916-1-v1.log.1', '8614f857/kafka/topic-zmilrsrinp/12_51/2309-2350-20977639-1-v1.log.1', '26705615/kafka/topic-zmilrsrinp/12_51/3902-3944-20977841-1-v1.log.1', '1d657d6c/kafka/topic-zmilrsrinp/12_51/5229-5268-20977360-1-v1.log.1', '3a0a21de/kafka/topic-zmilrsrinp/12_51/1301-1342-20977639-1-v1.log.1', 'c3d9107a/kafka/topic-zmilrsrinp/12_51/4365-4404-20977360-1-v1.log.1', 'a5aa7ba0/kafka/topic-zmilrsrinp/12_51/3442-3483-20977639-1-v1.log.1', '50d95089/kafka/topic-zmilrsrinp/12_51/1973-2014-20977639-1-v1.log.1', '5987db7d/kafka/topic-zmilrsrinp/12_51/4323-4364-20977639-1-v1.log.1', '9fd7ca1c/kafka/topic-zmilrsrinp/12_51/4949-4988-20977360-1-v1.log.1', 'f82f2a9a/kafka/topic-zmilrsrinp/12_51/839-878-20977360-1-v1.log.1', 'df34d9ae/kafka/topic-zmilrsrinp/12_51/4405-4447-20977841-1-v1.log.1', '81cfe030/kafka/topic-zmilrsrinp/12_51/713-754-20977638-1-v1.log.1', '2babec87/kafka/topic-zmilrsrinp/12_51/1259-1300-20977639-1-v1.log.1', '8a47cf0f/kafka/topic-zmilrsrinp/12_51/1637-1676-20977360-1-v1.log.1', '55aa25ec/kafka/topic-zmilrsrinp/12_51/1217-1258-20977639-1-v1.log.1', 'f3451177/kafka/topic-zmilrsrinp/12_51/1595-1636-20977639-1-v1.log.1', 'fb7c057b/kafka/topic-zmilrsrinp/12_51/3020-3063-20977918-1-v1.log.1'], 'last_complete_scrub_at': 1700085220006}]

JIRA Link: CORE-1603

andijcr avatar Nov 28 '23 13:11 andijcr

This seems to be a problem with deleting segments. The segments may have been partially deleted (delete happens in batch). Since not all of them were deleted, the manifest was not updated. The anomaly is caused by the manifest pointing to the segment but the segment was earlier deleted:

log of segments being deleted:

INFO  2023-11-15 21:53:05,409 [shard 3:au  ] archival - [fiber111 kafka/topic-zmilrsrinp/12] - ntp_archiver_service.cc:2670 - Deleting segment from cloud storage: {"63ea855a/kafka/topic-zmilrsrinp/12_51/0-41-20977591-1-v1.log.1"}
INFO  2023-11-15 21:53:05,409 [shard 3:au  ] archival - [fiber111 kafka/topic-zmilrsrinp/12] - ntp_archiver_service.cc:2670 - Deleting segment from cloud storage: {"45ffb451/kafka/topic-zmilrsrinp/12_51/42-83-20977636-1-v1.log.1"}
...

Soon after NTP archiver service logs that delete failed:

INFO  2023-11-15 21:53:29,983 [shard 3:au  ] archival - [fiber111 kafka/topic-zmilrsrinp/12] - ntp_archiver_service.cc:2718 - Failed to delete all selected segments from cloud storage. Will retry on the next housekeeping run.

Presumably many of the segments are actually deleted by this point but we do not update the manifest.

At some point after this, the scrub seems to have run and detected the missing segments:

WARN  2023-11-15 21:53:35,332 [shard 3:au  ] cloud_storage - [fiber113~0~1|1|299970ms] - remote.cc:943 - HeadObject from {panda-bucket-789abc7a-8400-11ee-b7d0-35661a636e44}, {key_not_found}, segment at {"63ea855a/kafka/topic-zmilrsrinp/12_51/0-41-20977591-1-v1.log.1"} not available
...
INFO  2023-11-15 21:53:40,006 [shard 3:au  ] archival - [fiber113 kafka/topic-zmilrsrinp/12] - scrubber.cc:139 - Scrub which started at {nullopt} finished at {nullopt} with status {full} and detected {missing_partition_manifest: false, missing_spillover_manifests: 0, missing_segments: 124, segment_metadata_anomalies: 0} and used 127 quota

abhijat avatar Dec 05 '23 14:12 abhijat

The above reason outlined in comment seems unlikely, because GC works on segments which are already in the replaced list or below manifest start offset, so the scrubber would not end up looking for those segments in the bucket.

abhijat avatar Dec 05 '23 14:12 abhijat

One possibility is that GC collects and removes all segments below manifest start offset, whereas scrub starts at the beginning of the manifest segment collection. Perhaps the scrub is starting earlier than it should, and it needs to start at manifest start offset.

DEBUG 2023-11-15 21:53:05,211 [shard 3:main] cluster - ntp: {kafka/topic-zmilrsrinp/12} - archival_metadata_stm.cc:1313 - Updating start offset, current value 4827, update 5269
DEBUG 2023-11-15 21:53:05,211 [shard 3:main] cluster - ntp: {kafka/topic-zmilrsrinp/12} - archival_metadata_stm.cc:1320 - Start offset updated to 5269

abhijat avatar Dec 05 '23 14:12 abhijat

*https://buildkite.com/redpanda/vtools/builds/11071

vbotbuildovich avatar Dec 13 '23 20:12 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11190

vbotbuildovich avatar Dec 19 '23 00:12 vbotbuildovich

One possibility is that GC collects and removes all segments below manifest start offset, whereas scrub starts at the beginning of the manifest segment collection. Perhaps the scrub is starting earlier than it should, and it needs to start at manifest start offset.

DEBUG 2023-11-15 21:53:05,211 [shard 3:main] cluster - ntp: {kafka/topic-zmilrsrinp/12} - archival_metadata_stm.cc:1313 - Updating start offset, current value 4827, update 5269
DEBUG 2023-11-15 21:53:05,211 [shard 3:main] cluster - ntp: {kafka/topic-zmilrsrinp/12} - archival_metadata_stm.cc:1320 - Start offset updated to 5269

This seems correct and the lastest failures show that the missing segments are before the start offset but still in the segment_set of the manifest. So technically not problematic but it's some form of race condition while cleaning up the data and updating the manifest. sev/medium also for the frequency

andijcr avatar Jan 05 '24 10:01 andijcr

*https://buildkite.com/redpanda/vtools/builds/11393

vbotbuildovich avatar Jan 11 '24 00:01 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11721

vbotbuildovich avatar Feb 06 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11929

vbotbuildovich avatar Feb 21 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/12030

vbotbuildovich avatar Mar 01 '24 00:03 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/12172#018e1a81-7a47-474d-8711-c70337e96497

vbotbuildovich avatar Mar 14 '24 00:03 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/12472

vbotbuildovich avatar Mar 23 '24 21:03 vbotbuildovich

@abhijat should we leave this open or are you going to rework this anyway and we should disable/redo this test?

piyushredpanda avatar Mar 23 '24 21:03 piyushredpanda

*https://buildkite.com/redpanda/vtools/builds/13369

vbotbuildovich avatar May 01 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13519

vbotbuildovich avatar May 07 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13970

vbotbuildovich avatar May 22 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/14129

vbotbuildovich avatar May 29 '24 21:05 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13369 *https://buildkite.com/redpanda/vtools/builds/13519 *https://buildkite.com/redpanda/vtools/builds/13970 *https://buildkite.com/redpanda/vtools/builds/14129

vbotbuildovich avatar Jun 11 '24 21:06 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13369 *https://buildkite.com/redpanda/vtools/builds/13519 *https://buildkite.com/redpanda/vtools/builds/13970 *https://buildkite.com/redpanda/vtools/builds/14129

vbotbuildovich avatar Jun 11 '24 21:06 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/13369 *https://buildkite.com/redpanda/vtools/builds/13519 *https://buildkite.com/redpanda/vtools/builds/13970 *https://buildkite.com/redpanda/vtools/builds/14129

vbotbuildovich avatar Jun 12 '24 21:06 vbotbuildovich

Closing older-bot-filed CI issues as we transition to a more reliable system.

piyushredpanda avatar Sep 24 '24 04:09 piyushredpanda