redpanda
redpanda copied to clipboard
filesystem errors during FranzGoVerifiableWithSiTest/test_si_without_timeboxed/segment_size=104857600
[INFO - 2022-08-06 07:54:38,171 - runner_client - log - lineno:278]: RunnerClient: rptest.scale_tests.franz_go_verifiable_test.FranzGoVerifiableWithSiTest.test_si_without_timeboxed.segment_size=104857600: FAIL: <BadLogLines nodes=ip-172-31-50-177(32),ip-172-31-50-215(15),ip-172-31-58-52(24) example="ERROR 2022-08-06 05:03:17,873 [shard 2] cloud_storage - [fiber3~8~37~0|1|46632ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/08cdff59/kafka/topic-nfwsdorgnf/76_21/0-1-v1.log.1.index"])">
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
data = self.run_test()
File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
return self.test_context.function(self.test)
File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 476, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 48, in wrapped
self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1122, in raise_on_bad_logs
raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-50-177(32),ip-172-31-50-215(15),ip-172-31-58-52(24) example="ERROR 2022-08-06 05:03:17,873 [shard 2] cloud_storage - [fiber3~8~37~0|1|46632ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/08cdff59/kafka/topic-nfwsdorgnf/76_21/0-1-v1.log.1.index"])">
these are repeated quite often during scale test:
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:00:49,780 [shard 3] cloud_storage - [fiber6~75~9~0|1|49342ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/6d4ce52f/kafka/topic-nfwsdorgnf/9_21/40-2-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:03:16,346 [shard 2] cloud_storage - [fiber13~70~159~0|1|46513ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/980a4514/kafka/topic-nfwsdorgnf/52_21/101-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:06:35,736 [shard 2] cloud_storage - [fiber13~72~9~0|1|47297ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/a9ea197e/kafka/topic-nfwsdorgnf/52_21/0-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:09:48,815 [shard 1] cloud_storage - [fiber18~76~29~0|1|58942ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/20743e08/kafka/topic-nfwsdorgnf/15_21/0-1-v1.log.2.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:18:07,990 [shard 1] cloud_storage - [fiber18~82~66~0|1|55568ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/20743e08/kafka/topic-nfwsdorgnf/15_21/0-1-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:18:22,213 [shard 3] cloud_storage - [fiber10~75~13~0|1|45536ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/9314b74e/kafka/topic-nfwsdorgnf/97_21/110-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:21:31,008 [shard 1] rpc - server.cc:126 - kafka rpc protocol - Error[applying protocol] remote address: 205.210.31.19:57193 - std::runtime_error (Unexpected EOF for client ID)
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:29:50,532 [shard 3] cloud_storage - [fiber21~45~87~0|1|46615ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/228ea5bd/kafka/topic-nfwsdorgnf/45_21/0-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:42:05,136 [shard 3] cloud_storage - [fiber10~87~12~0|1|45809ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/9314b74e/kafka/topic-nfwsdorgnf/97_21/110-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,170 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:50:37,605 [shard 3] cloud_storage - [fiber6~92~319~0|1|56266ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/12426a3f/kafka/topic-nfwsdorgnf/9_21/0-1-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,170 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:52:44,012 [shard 2] cloud_storage - [fiber13~85~145~0|1|50993ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/a9ea197e/kafka/topic-nfwsdorgnf/52_21/0-1-v1.log.1.index"])
Looks like cache eviction removes files right after they're downloaded or generated. Or maybe the node is out of disk space.
@abhijat another here https://buildkite.com/redpanda/vtools/builds/3139#01827b68-ebc9-4ff8-867f-d9461479bafe
these errors should hopefully reduce/disappear with #5915 but we still need to investigate root cause, so we can keep ticket open.
It is easy to reproduce them for investigation, we can just set the cache size to 5mb on the CDT driver node in the scale_tests/franz_go..py test module.
this hasn't been seen in a long while but @abhijat wants to investigate/RCA. Adding to v22.3 stretch items which is a placeholder for such tickets.
This looks the same as https://github.com/redpanda-data/redpanda/issues/6601
Was fixed by https://github.com/redpanda-data/redpanda/pull/6794