brew icon indicating copy to clipboard operation
brew copied to clipboard

Use binary patching to improve relocatability of bottles

Open danielnachun opened this issue 3 years ago • 15 comments

Provide a detailed description of the proposed feature

Currently we have some bottles which are non-relocatable – they can only be used if they are installed in the default cellar locations we use in CI. This is because these bottles contain binaries with strings which reference the cellar.

I propose we make all of our binaries relocatable by implementing an approach used by Anaconda: https://docs.conda.io/projects/conda-build/en/latest/resources/make-relocatable.html. This solution would require two changes to Homebrew:

  1. Build bottles with a long placeholder prefix (Anaconda uses 255 characters).
  2. During keg relocation, if a bottle does not have a relocatable cellar, use binary patching to replace the placeholder prefix with the user’s cellar, with null padding to make it the same length as the placeholder.

As long as the user does not have an install prefix longer than the placeholder prefix, this should make all binaries we build relocatable. Note that it will require rebuilding all bottles with non-relocatable cellars (about 1000 formulae).

On macOS ARM64, Anaconda has to codesign the patched binaries, which makes it likely we will need to the same.

For reference, the binary relocation in Anaconda is implemented here: https://github.com/conda/conda/blob/6fc320aa625939db271c2227d935e4588d277b61/conda/core/portability.py.

What is the motivation for the feature?

Using a non-default cellar is probably the most common reason for a user to still have to build from source. Making all our bottles relocatable would resolve this. It would also make a lot of other features easier to implement like installing bottles using the JSON API.

How will the feature be relevant to at least 90% of Homebrew users?

It would greatly improve the experience for all users who use non-default cellars. We don’t officially support building from source, so we can only support non-default cellars on a best effort basis. In addition, as mentioned above it would make it easier for implement other features that will benefit the vast majority of users.

What alternatives to the feature have been considered?

The current alternative is to build these formulae from source.

danielnachun avatar Mar 14 '21 04:03 danielnachun

this is a distinct minority of our formulae

It's a significant minority though:

❯ rg --files-without-match 'cellar: :any' | wc -l
1553

Not that I'm opposed to this. All bottles being relocatable would be cool.

carlocab avatar Mar 14 '21 04:03 carlocab

I updated the post to add that number, thanks for providing that! I'm hoping that it will be one of those situations where most rebuild fine - we're not actually changing the formulae themselves so whatever fails would have failed the next time it was rebuilt anyway.

danielnachun avatar Mar 14 '21 05:03 danielnachun

One question for which feedback would be great: I'm thinking it could be valuable to have a new cellar type like relocatable which would mean a bottle could now be relocated using our patching strategy. The advantage of doing that would be that we wouldn't have to get all bottles rebuilt right away and could focus on the most popular and difficult formula first.

danielnachun avatar Mar 14 '21 05:03 danielnachun

I like this idea and would like to see it move forward. I think it should be something that we feature-flag (i.e. set with an environment variable) and do local testing in a few hundred bottles before we roll it out more widely. We ship a much wider variety of software than Anaconda so I'm optimistic about this increasing the number of reproducible bottles but dubious (please prove me wrong!) that it'll work with all our currently non-relocatable bottles.

MikeMcQuaid avatar Mar 15 '21 11:03 MikeMcQuaid

I think it should be something that we feature-flag (i.e. set with an environment variable) and do local testing in a few hundred bottles before we roll it out more widely.

@sjackman had a really good suggestion for how we could feature-flag part 2: initially we would only support patching for prefix whose length was less than or equal to the prefixes we currently use in CI. This gives the most flexibility on Linux, where /home/linuxbrew/.linuxbrew is 26 characters, and the least flexibility for macOS Intel, where /usr/local is only 10. Nonetheless one could use something like /opt/hb and this would be shorter than any of those for testing purposes.

I was thinking about how to feature-flag bottles with the long build prefix, and this will definitely be harder. The reason is that once we build with a long prefix, we have to patch prefixes even when installing to the default cellars. The safest way to deploy this would be to temporarily build the bottles of interest with two prefixes:

  1. The default prefix, which would require no patching to use in the default cellars but wouldn't be fully relocatable
  2. The long placeholder prefix, which would require patching for all cellars including the defaults but would be fully relocatable.

If we go by that approach, we could fully test patching the most popular non-relocatable formulae built with long prefixes to work in the default prefix. Once we were confident we weren't breaking anything widely used, we could switch to only using the long placeholder prefix for building bottles. This could be done on a per-bottle basis in case some formula were particularly difficult to deal with. We could also limit this initially to Linux where non-default cellars are much more common, and then expand to macOS once we'd ironed out some of the bumps.

Although this would temporarily increase the build times for those formulae (because the bottle would have to be built twice), the upside to all this is that eventually there would hopefully be little distinction between "default" and "non-default" prefixes.

danielnachun avatar Mar 15 '21 22:03 danielnachun

What's the benefit to building two different kinds of bottles? What determines which kind of bottle a user gets?

carlocab avatar Mar 15 '21 22:03 carlocab

We ship a much wider variety of software than Anaconda so I'm optimistic about this increasing the number of reproducible bottles but dubious (please prove me wrong!) that it'll work with all our currently non-relocatable bottles.

This makes me think one data point that would be helpful here is to query Anaconda's API to see how many of our non-relocatable formulae have been built with Anaconda (if anyone know how to do this that would be super helpful!). Although a software package building with Anaconda successfully doesn't guarantee it will work correctly in Homebrew, I think it would make success more likely.

danielnachun avatar Mar 15 '21 22:03 danielnachun

I overcounted a little (just by about 50%...):

❯ rg --files-without-match '(cellar: :any)|(bottle :unneeded)' | wc -l
989

carlocab avatar Mar 15 '21 22:03 carlocab

What's the benefit to building two different kinds of bottles? What determines which kind of bottle a user gets?

Great question. Right now, when a user tries to install a non-relocatable formula, we check if HOMEBREW_PREFIX matches the prefix we used to build in CI. If it does, we install the bottle and no patching needs to be done for it to work. If HOMEBREW_PREFIX does not match the CI build prefix, then the user must install from source.

The reason we might have to consider temporarily building two different kinds of bottles is because once you build a non-relocatable bottle with some long placeholder prefix (/tmp/homebrew_placeholder_placeholder_placeholder...), it will need to be patched to work in all cellars including the default one. If we were very confident in our binary patching (which as I described above can tested extensively just by using a very short install prefix), then we could just switch to using the long build prefix when building a bottle.

The reason I had suggested possibly building two bottles is so that if there is some issue with the patching, the bottle built with the default prefix in CI will work for users for whom HOMEBREW_PREFIX is the default prefix will work without any patching needed. This would basically be a "fallback" so that we wouldn't break everything.

As I write this up, I guess the other option would just be to fall back to building from source if there is a patching issue. That's not ideal either but would certainly be simpler!

danielnachun avatar Mar 15 '21 22:03 danielnachun

I overcounted a little (just by about a 50%...):

❯ rg --files-without-match '(cellar: :any)|(bottle :unneeded)' | wc -l
989

Wow, that's really not bad given that we have ~5500 formulae - less than 20%!

danielnachun avatar Mar 15 '21 22:03 danielnachun

@danielnachun Sorry, what I'm proposing is we don't distribute any of these bottles (with the longer prefix) to users until we've verified the approach on our local machines. Feature flagging would allow this code to still be on master but without actually being used yet other than maintainers opting-in locally.

I definitely don't think we should be building two types of bottles but, as with most global bottle changes, this is going to need a lot of manual verification before we change what we ship to users.

MikeMcQuaid avatar Mar 16 '21 09:03 MikeMcQuaid

We can build just the one bottle with prefix /home/linuxbrew/.linuxbrew and roll out binary relocation to users whose brew --prefix is no more than 26 characters in this order…

  1. HOMEBREW_DEVELOPER
  2. homebrew.devcmdrun
  3. All users

Once the feature has been in use by all users for some number of months, and we're confident with it, then we can consider switching to building bottles with a long placeholder prefix.

sjackman avatar Mar 16 '21 16:03 sjackman

@danielnachun Sorry, what I'm proposing is we don't distribute any of these bottles (with the longer prefix) to users until we've verified the approach on our local machines. Feature flagging would allow this code to still be on master but without actually being used yet other than maintainers opting-in locally.

I definitely don't think we should be building two types of bottles but, as with most global bottle changes, this is going to need a lot of manual verification before we change what we ship to users.

Thanks for clarifying, it makes perfect sense now! Basically we would build bottles locally with the long prefix for popular non-relocatable formulae and distribute them for testing. When we get closer to that step I'll start investigating what kinds of tests we should use and how best to distribute those bottles.

danielnachun avatar Mar 16 '21 23:03 danielnachun

We can build just the one bottle with prefix /home/linuxbrew/.linuxbrew and roll out binary relocation to users whose brew --prefix is no more than 26 characters in this order…

  1. HOMEBREW_DEVELOPER
  2. homebrew.devcmdrun
  3. All users

Once the feature has been in use by all users for some number of months, and we're confident with it, then we can consider switching to building bottles with a long placeholder prefix.

Although we wouldn't get as many regular users doing this, we could also do this on macOS by having testers use a prefix like /opt/hb which is shorter than either /usr/local or /opt/homebrew.

danielnachun avatar Mar 16 '21 23:03 danielnachun

We can build just the one bottle with prefix /home/linuxbrew/.linuxbrew and roll out binary relocation to users whose brew --prefix is no more than 26 characters in this order… Although we wouldn't get as many regular users doing this, we could also do this on macOS by having testers use a prefix like /opt/hb which is shorter than either /usr/local or /opt/homebrew.

Yeh, these seem like good tests to do before we start modifying the way we ship bottles 👍🏻

MikeMcQuaid avatar Mar 17 '21 09:03 MikeMcQuaid

Closing this out because work isn't really ongoing here but happy to continue discussion in here.

MikeMcQuaid avatar Feb 22 '23 16:02 MikeMcQuaid