redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

CI Failure (`cloud_storage::download_exception (NotFound)`) in `CloudRetentionTest.test_cloud_retention`

Open vbotbuildovich opened this issue 1 year ago • 2 comments

https://buildkite.com/redpanda/vtools/builds/11824

Module: rptest.tests.cloud_retention_test
Class: CloudRetentionTest
Method: test_cloud_retention
Arguments: {
    "cloud_storage_type": 1,
    "max_consume_rate_mb": null
}
test_id:    CloudRetentionTest.test_cloud_retention
status:     FAIL
run time:   472.215 seconds

<BadLogLines nodes=ip-172-31-12-189(1) example="ERROR 2024-02-12 12:19:23,003 [shard 2:fetc] kafka - fetch.cc:1171 - unknown exception thrown: cloud_storage::download_exception (NotFound)">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 173, in wrapped
    redpanda.raise_on_bad_logs(
  File "/home/ubuntu/redpanda/tests/rptest/services/redpanda.py", line 1412, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-12-189(1) example="ERROR 2024-02-12 12:19:23,003 [shard 2:fetc] kafka - fetch.cc:1171 - unknown exception thrown: cloud_storage::download_exception (NotFound)">

vbotbuildovich avatar Feb 13 '24 20:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11875 *https://buildkite.com/redpanda/vtools/builds/11880 *https://buildkite.com/redpanda/vtools/builds/11881

vbotbuildovich avatar Feb 17 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11886 *https://buildkite.com/redpanda/vtools/builds/11892 *https://buildkite.com/redpanda/vtools/builds/11891

vbotbuildovich avatar Feb 18 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11895 *https://buildkite.com/redpanda/vtools/builds/11901 *https://buildkite.com/redpanda/vtools/builds/11902

vbotbuildovich avatar Feb 19 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11905 *https://buildkite.com/redpanda/vtools/builds/11911 *https://buildkite.com/redpanda/vtools/builds/11910

vbotbuildovich avatar Feb 20 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11929 *https://buildkite.com/redpanda/vtools/builds/11937 *https://buildkite.com/redpanda/vtools/builds/11938

vbotbuildovich avatar Feb 21 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11944 *https://buildkite.com/redpanda/vtools/builds/11952 *https://buildkite.com/redpanda/vtools/builds/11951

vbotbuildovich avatar Feb 22 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11957 *https://buildkite.com/redpanda/vtools/builds/11962 *https://buildkite.com/redpanda/vtools/builds/11963

vbotbuildovich avatar Feb 23 '24 00:02 vbotbuildovich

*https://buildkite.com/redpanda/vtools/builds/11967 *https://buildkite.com/redpanda/vtools/builds/11971

vbotbuildovich avatar Feb 24 '24 00:02 vbotbuildovich

The line that is printed as part of this CI report is

rptest.services.utils.BadLogLines: <BadLogLines nodes=ip-172-31-12-189(1) example="ERROR 2024-02-12 12:19:23,003 [shard 2:fetc] kafka - fetch.cc:1171 - unknown exception thrown: cloud_storage::download_exception (NotFound)">

But that line was removed by this commit, and in particular, there was some handling int his commit related to exceptions from outside fetch bubbling up.

commit a09d160db0f74034cdf47ae5204afbe5a7218cad
Author: Brandon Allard <[email protected]>
Date:   Thu Feb 8 22:41:52 2024 -0500

    kafka: rethrow on unknown exceptions in fetch handler

    Exceptions from outside the Kafka handler context can bubble up to the
    catch present in the handler. This seems to be the way some subsystems
    communicate issues with the requests. There is no current listing of
    what exceptions subsystems may throw, if/how to recover from these
    exceptions, or if a fetch should end as a result of the exception. Hence
    for the time being the fetch impl will revert to the behavior of further
    bubbling up unknown exceptions to outside of the handler context.

 src/v/kafka/server/handlers/fetch.cc | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

dotnwat avatar Feb 26 '24 19:02 dotnwat

Indeed, that commit was marked as resolving a nearly identical CI failure report.

So, this can be considered a duplicate of https://github.com/redpanda-data/redpanda/issues/16532 and was resolved by https://github.com/redpanda-data/redpanda/pull/16554.

dotnwat avatar Feb 26 '24 19:02 dotnwat