bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Add support to retry download after a read timeout

Open Bechir-Braham opened this issue 6 months ago • 4 comments

Description of the feature request:

Even though there are multiple options to retry downloads such as --http_connector_attempts and --experimental_repository_downloader_retries. Running into a download timeout during fetching will not trigger a retry.

Would it be possible to consider the case of download timeouts a possible re-trigger to experimental_repository_downloader_retries ?

I have to admit I don't have any experience with the bazel source code but I can see here here that only ContentLengthMismatchException or SocketException are considered for possible re-triggers. Would it be possible to consider SocketTimeoutException another retriable exception?

I would be happy to contribute to the implementation of this request if you can confirm that this is possible and not disallowed by design.

Which category does this issue belong to?

Core

What underlying problem are you trying to solve with this feature?

The underlying problem we're facing is that during peak request times the artifact storage can start to timeout on downloads. These errors happened while having set --http_connector_attempts=11 and --experimental_repository_downloader_retries=10. Here in this case it would be very beneficial to also use the retries from experimental_repository_downloader_retries to re-run downloads if there is a Read timeout error.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

"release 7.6.0" (Also should be the same behaviour for 8+)

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?


Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

here is an example from CI

clang-tidy	Run
 bazel build --config=clang-tidy --build_tag_filters=-no-clang-tidy,-fast-clang-tidy --target_pattern_file=tools/target_patterns/clang_tidy.params
2025-05-27T06:01:48.2971478Z ERROR: some/path/BUILD:465:8: //some/path/some_test depends on @@_main~_repo_rules~some_test_data//:srcs in repository @@_main~_repo_rules~some_test_data which failed to fetch. 
no such package '@@_main~_repo_rules~some_test_data//': java.io.IOException: Error downloading [https://artifacts_registry_url/repository/path/file.zip] to /mnt/data/bazel-user-root/45ef7d2bd11527ab6fca94135f0ad0a0/external/_main~_repo_rules~some_test_data/file.zip: java.net.SocketTimeoutException: Read timed out

Bechir-Braham avatar May 27 '25 15:05 Bechir-Braham

Sure, feel free to send a PR!

meteorcloudy avatar Jun 10 '25 15:06 meteorcloudy

Instead of retrying on a timeout, would a longer timeout work for you? I don't know whether it's configurable yet, but allowing it to be increased seems more natural to me than retrying it.

fmeum avatar Jun 10 '25 15:06 fmeum

@fmeum Thanks, that's a good point, and we do have --experimental_scale_timeouts, which might be a better fit to work around the issue? @Bechir-Braham

meteorcloudy avatar Jun 10 '25 17:06 meteorcloudy

OK, that flag controls the repository_ctx.execute timeout, the one controls http download timeout is --http_timeout_scaling

meteorcloudy avatar Jun 10 '25 17:06 meteorcloudy