redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

filesystem errors during FranzGoVerifiableWithSiTest/test_si_without_timeboxed/segment_size=104857600

Open abhijat opened this issue 1 year ago • 3 comments

[INFO  - 2022-08-06 07:54:38,171 - runner_client - log - lineno:278]: RunnerClient: rptest.scale_tests.franz_go_verifiable_test.FranzGoVerifiableWithSiTest.test_si_without_timeboxed.segment_size=104857600: FAIL: <BadLogLines nodes=ip-172-31-50-177(32),ip-172-31-50-215(15),ip-172-31-58-52(24) example="ERROR 2022-08-06 05:03:17,873 [shard 2] cloud_storage - [fiber3~8~37~0|1|46632ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/08cdff59/kafka/topic-nfwsdorgnf/76_21/0-1-v1.log.1.index"])">
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 48, in wrapped
    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1122, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-50-177(32),ip-172-31-50-215(15),ip-172-31-58-52(24) example="ERROR 2022-08-06 05:03:17,873 [shard 2] cloud_storage - [fiber3~8~37~0|1|46632ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/08cdff59/kafka/topic-nfwsdorgnf/76_21/0-1-v1.log.1.index"])">

these are repeated quite often during scale test:

[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:00:49,780 [shard 3] cloud_storage - [fiber6~75~9~0|1|49342ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/6d4ce52f/kafka/topic-nfwsdorgnf/9_21/40-2-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:03:16,346 [shard 2] cloud_storage - [fiber13~70~159~0|1|46513ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/980a4514/kafka/topic-nfwsdorgnf/52_21/101-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:06:35,736 [shard 2] cloud_storage - [fiber13~72~9~0|1|47297ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/a9ea197e/kafka/topic-nfwsdorgnf/52_21/0-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:09:48,815 [shard 1] cloud_storage - [fiber18~76~29~0|1|58942ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/20743e08/kafka/topic-nfwsdorgnf/15_21/0-1-v1.log.2.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:18:07,990 [shard 1] cloud_storage - [fiber18~82~66~0|1|55568ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/20743e08/kafka/topic-nfwsdorgnf/15_21/0-1-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:18:22,213 [shard 3] cloud_storage - [fiber10~75~13~0|1|45536ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/9314b74e/kafka/topic-nfwsdorgnf/97_21/110-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:21:31,008 [shard 1] rpc - server.cc:126 - kafka rpc protocol - Error[applying protocol] remote address: 205.210.31.19:57193 - std::runtime_error (Unexpected EOF for client ID)
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:29:50,532 [shard 3] cloud_storage - [fiber21~45~87~0|1|46615ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/228ea5bd/kafka/topic-nfwsdorgnf/45_21/0-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,169 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:42:05,136 [shard 3] cloud_storage - [fiber10~87~12~0|1|45809ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/9314b74e/kafka/topic-nfwsdorgnf/97_21/110-1-v1.log.1.index"])
[WARNING - 2022-08-06 07:54:38,170 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:50:37,605 [shard 3] cloud_storage - [fiber6~92~319~0|1|56266ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/12426a3f/kafka/topic-nfwsdorgnf/9_21/0-1-v1.log.2"])
[WARNING - 2022-08-06 07:54:38,170 - redpanda - raise_on_bad_logs - lineno:1107]: [test_si_without_timeboxed] Unexpected log line on ip-172-31-58-52: ERROR 2022-08-06 07:52:44,012 [shard 2] cloud_storage - [fiber13~85~145~0|1|50993ms] - remote.cc:114 - System error std::__1::__fs::filesystem::filesystem_error (error system:2, filesystem error: stat failed: No such file or directory ["/var/lib/redpanda/data/cloud_storage_cache/a9ea197e/kafka/topic-nfwsdorgnf/52_21/0-1-v1.log.1.index"])

abhijat avatar Aug 06 '22 08:08 abhijat

Looks like cache eviction removes files right after they're downloaded or generated. Or maybe the node is out of disk space.

Lazin avatar Aug 08 '22 14:08 Lazin

@abhijat another here https://buildkite.com/redpanda/vtools/builds/3139#01827b68-ebc9-4ff8-867f-d9461479bafe

dotnwat avatar Aug 09 '22 04:08 dotnwat

these errors should hopefully reduce/disappear with #5915 but we still need to investigate root cause, so we can keep ticket open.

It is easy to reproduce them for investigation, we can just set the cache size to 5mb on the CDT driver node in the scale_tests/franz_go..py test module.

abhijat avatar Aug 09 '22 15:08 abhijat

this hasn't been seen in a long while but @abhijat wants to investigate/RCA. Adding to v22.3 stretch items which is a placeholder for such tickets.

piyushredpanda avatar Sep 23 '22 09:09 piyushredpanda

This looks the same as https://github.com/redpanda-data/redpanda/issues/6601

Was fixed by https://github.com/redpanda-data/redpanda/pull/6794

jcsp avatar Nov 03 '22 15:11 jcsp