brew icon indicating copy to clipboard operation
brew copied to clipboard

Tracking Issue: Concurrent Downloads

Open reitermarkus opened this issue 1 year ago • 11 comments

Concurrent Downloads

  • [x] Implement MVP of concurrent downloads for brew fetch.

    Implemented in https://github.com/Homebrew/brew/pull/17756.

  • [x] Improve output logic.

    ~~For simplicity, the output currently only uses at most terminal height - 1 lines due to a trailing newline.~~

    Implemented in https://github.com/Homebrew/brew/pull/19194.

  • [ ] Handle custom download strategies.

    These may not be possible to run concurrently so let's handle them differently.

  • [ ] Implement graceful cancellation of downloads.

    Currently, cancelling downloads can only be done by killing the whole thread pool, i.e. the sledgehammer approach. Proper cancellation based on https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Cancellation.html should be implemented, making it possible to neatly show successful, failed and cancelled downloads.

  • [ ] Replace direct puts output with different output formatters for serial/concurrent download output.

  • [ ] Implement concurrent downloads for brew install, brew reinstall, brew upgrade etc. with a --concurrency flag.

  • [ ] Implement global concurrent downloads with a public/documented/supported HOMEBREW_DOWNLOADS_CONCURRENCY=<int> opt-in variable.

  • [ ] Limit concurrent connections per host to avoid overloading smaller (i.e. non-CDN) web servers.

  • [ ] Enable concurrent downloads by default, i.e. change default value for --concurrency flag.

reitermarkus avatar Sep 07 '24 13:09 reitermarkus

Thanks for the write-up @reitermarkus! Makes sense to me. Please focus on getting https://github.com/Homebrew/brew/pull/17756 merged ASAP.

  • In order to implement the following parts, custom download strategies need to be deprecated. Their API surface is too big since they depend on many private methods from their superclass, making changes here practically impossible without breaking things.

We can't/won't do this. Things as basic as "download a release from a private GitHub repository" require this. Instead we should expect to provide suboptimal/poor/no progress reporting, graceful cancellation for these strategies.

MikeMcQuaid avatar Sep 07 '24 18:09 MikeMcQuaid

Thoughts so far:

  • missing support for falling back to earlier bottle tags
$ brew fetch --retry --concurrency=10 boost qt
Fetching: boost, qt
Warning: Bottle for tag :arm64_sequoia is unavailable.
Warning: Bottle for tag :arm64_sequoia is unavailable.
  • the above message should probably also warn which bottle it relates to
  • we should limit the hosts we'll download in parallel from to an allowlist so we don't e.g. try to download the 10 files at once from a poor personal web server when building from source

MikeMcQuaid avatar Sep 10 '24 07:09 MikeMcQuaid

We can't/won't do this. Things as basic as "download a release from a private GitHub repository" require this.

If it's that basic, we should support it using an official download strategy. In any case, we don't want to maintain two different types of download strategies.

reitermarkus avatar Sep 10 '24 10:09 reitermarkus

@reitermarkus We historically supported many more types of download strategies that we didn't use e.g. those for private resources. Problem is: when we don't actually use and rely on them ourselves, they end up bitrotting.

In general: even if we were to support them: I'd rather not break an existing, public (implicit or not) API for people for new functionality if we can just degrade to not support that functionality there instead.

MikeMcQuaid avatar Sep 10 '24 11:09 MikeMcQuaid

This will probably be easier to discuss with a demonstration of what exactly needs changing in download strategies that would be a breaking change.

Bo98 avatar Sep 10 '24 16:09 Bo98

  • Currently, cancelling downloads can only be done by killing the whole thread pool, i.e. the sledgehammer approach. Proper cancellation based on https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Cancellation.html should be implemented, making it possible to neatly show successful, failed and cancelled downloads.

Concurrent::Cancellation isn't stable yet so isn't actually available in concurrent-ruby. It's a WIP feature in a separate beta gem for now.

Worth noting that download strategies already need to support Ctrl+C Interrupt exceptions I think. So Thread#raise Interrupt shouldn't be surprising.

Bo98 avatar Sep 10 '24 16:09 Bo98

Thanks for updates @reitermarkus. Have made some edits but feel free to discuss any down here.

MikeMcQuaid avatar Feb 03 '25 14:02 MikeMcQuaid

We don't need HOMEBREW_NO_DOWNLOADS_CONCURRENCY since HOMEBREW_DOWNLOADS_CONCURRENCY=1 would do the same thing.

reitermarkus avatar Feb 03 '25 19:02 reitermarkus

We don't need HOMEBREW_NO_DOWNLOADS_CONCURRENCY since HOMEBREW_DOWNLOADS_CONCURRENCY=1 would do the same thing.

Yeh, that makes sense. Wondering if we'll want to have a HOMEBREW_DOWNLOADS_CONCURRENCY=auto or something though to avoid requiring users to pick how many threads to use (as we pick it for them for e.g. make)

MikeMcQuaid avatar Feb 03 '25 20:02 MikeMcQuaid

Wondering if we'll want to have a HOMEBREW_DOWNLOADS_CONCURRENCY=auto or something

I think not specifying HOMEBREW_DOWNLOADS_CONCURRENCY or --concurrency would do that anyways once concurrent downloads are the default.

reitermarkus avatar Feb 04 '25 17:02 reitermarkus

I think not specifying HOMEBREW_DOWNLOADS_CONCURRENCY or --concurrency would do that anyways once concurrent downloads are the default.

Yup, I guess it just seems like a nicer interface than requiring users to figure out and specify a sensible number here before its default. Not a blocker or anything, just a nice-to-have.

MikeMcQuaid avatar Feb 05 '25 09:02 MikeMcQuaid

In Homebrew/core CI runs, I've seen some brew fetch get stuck, e.g. 18 hours fetching - https://github.com/Homebrew/homebrew-core/actions/runs/17193798425/job/48773380206?pr=234796#step:3:521

Looks concurrent download related but haven't reproduced locally.

cho-m avatar Aug 25 '25 16:08 cho-m

@cho-m Thanks for report! Not sure what we can do without reproduction here. Maybe worth adding more debugging to test-bot specifically?

MikeMcQuaid avatar Aug 26 '25 07:08 MikeMcQuaid

@cho-m Thanks for report! Not sure what we can do without reproduction here. Maybe worth adding more debugging to test-bot specifically?

If it is race condition then debugging logs may make it disappear from stdout writes. May still be worth trying as it does happen relatively frequently (seen it in 1+ PR/week). Or at least a max timeout for fetch in CI so that long timeout runner don't get stuck.

cho-m avatar Aug 29 '25 16:08 cho-m

Or at least a max timeout for fetch in CI so that long timeout runner don't get stuck.

@cho-m yup, this makes sense to do in brew test-bot.

MikeMcQuaid avatar Aug 29 '25 18:08 MikeMcQuaid

Hi all, thanks for the work on this! I'm no brew pro but am I right to assume that setting HOMEBREW_DOWNLOAD_CONCURRENCY=auto only speeds up brew install and not brew bundle installs?

The reason I ask is I tried brew bundle install and it appeared to be running in parallel; then I ran brew bundle list --formula | xargs brew install --verbose and it seemed to run faster.

techieshark avatar Oct 08 '25 11:10 techieshark

@techieshark Yes, those will not be fetched in parallel as-is. Some additional work for brew bundle would be required for that.

MikeMcQuaid avatar Oct 08 '25 13:10 MikeMcQuaid

I'm not sure if I should make a separate issue for this, but IMO this is a worse UI experience right now as there is no information about the progress of the concurrent downloads. Especially for some of the larger bottles like llvm, this makes brew just sit there with a spinner for over a minute, which doesn't tell me if it's stalled out or still running.

octylFractal avatar Oct 24 '25 07:10 octylFractal

@octylFractal Yes, the UI experience is worse. We'll review PRs to fix that but it's not a blocker on rolling this out to users given the huge performance increases for many cases.

MikeMcQuaid avatar Oct 24 '25 07:10 MikeMcQuaid

  • ~Implement graceful cancellation of downloads. Currently, cancelling downloads can only be done by killing the whole thread pool, i.e. the sledgehammer approach. Proper cancellation based on https://ruby-concurrency.github.io/concurrent-ruby/master/Concurrent/Cancellation.html should be implemented, making it possible to neatly show successful, failed and cancelled downloads.~

Noting that lack of graceful cancellation means users should avoid ctrl+c-ing brew install/upgrade, otherwise they may need to manually fix up some incomplete installations.

I've done this a number of times now while testing PRs where I quickly ctrl+c (or brew install -s fails) and end up with dozens of formulae missing INSTALL_RECEIPT.json. This means brew cannot autoremove these formulae.

cho-m avatar Oct 31 '25 13:10 cho-m

@cho-m Can you open a reproducible issue or ideally a PR for this? Thanks!

MikeMcQuaid avatar Oct 31 '25 13:10 MikeMcQuaid

@cho-m Can you open a reproducible issue or ideally a PR for this? Thanks!

I'll try to when I get a chance. Should be simple to reproduce in container with a numerous dependencies (e.g. brew install qt, wait and then ctrl+c before complete)


I think issue is due to pouring during download phase and then only writing tab during install phase. So, if you ctrl-c during download, it never hits the install part.

In old approach, we have a lot of exception handling to avoid this, e.g. https://github.com/Homebrew/brew/blob/f9553b6e0922842fe12114e4ad86bac0cc3c43d2/Library/Homebrew/formula_installer.rb#L596-L598

But we don't have similar handling in https://github.com/Homebrew/brew/blob/f9553b6e0922842fe12114e4ad86bac0cc3c43d2/Library/Homebrew/retryable_download.rb#L81-L84

EDIT: Though just handling exception above won't be enough. Need to handle rolling back a successful download/pour that did not hit the formula_installer

cho-m avatar Oct 31 '25 14:10 cho-m

I have another issue with the Ctrl+C interrupt behavior of the new concurrent downloader. It interrupts the command line output but does not actually interrupt the curl process, so downloads continue in the background which is probably not what you want if you are trying to interrupt some huge download.

wickles avatar Nov 08 '25 21:11 wickles

Enable concurrent downloads by default i.e. default HOMEBREW_DOWNLOAD_CONCURRENCY to auto and allow HOMEBREW_DOWNLOAD_CONCURRENCY=1 as opt-out.

This was done in https://github.com/Homebrew/brew/pull/20975.

injust avatar Nov 10 '25 16:11 injust