rules_docker icon indicating copy to clipboard operation
rules_docker copied to clipboard

Build puller/loader from source for repository rules

Open gravypod opened this issue 2 years ago • 37 comments

PR Checklist

Please check if your PR fulfills the following requirements:

  • [ ] Tests for the changes have been added (for bug fixes / features)
  • [ ] Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • [ ] Bugfix
  • [ ] Feature
  • [ ] Code style update (formatting, local variables)
  • [ ] Refactoring (no functional changes, no api changes)
  • [ ] Build related changes
  • [ ] CI related changes
  • [ ] Documentation content changes
  • [ ] Other... Please describe:

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Does this PR introduce a breaking change?

  • [ ] Yes
  • [ ] No

Other information

gravypod avatar Jul 23 '21 20:07 gravypod

Just a bit curious: whats the motivation for this PR? Also what is the current status? Do you need any help or having any blocker?

sluongng avatar Aug 04 '21 05:08 sluongng

@pcj, so I've done another pass and it looks like the pusher is not something that needs to be built during the repository phase. What we cooked up that night was mostly right. I think the last thing I need to add is some sort of CI check that the go.mod is up to date and in sync with the bazel files which I'm not too sure on. I'll ping some people from the Bazel slack.

gravypod avatar Aug 06 '21 22:08 gravypod

Just a bit curious: whats the motivation for this PR? Also what is the current status? Do you need any help or having any blocker?

Sorry about the delayed reply @sluongng. My main blocker was balancing this with other work. Right now rules_docker downloads a copy of the puller binary during the repository phase. This is for container_pull()s in your WORKSPACE file. This is done because during the repository phase we do not have access to go_binary() yet (that's loaded in later) and cannot build binaries. This MR does something slightly odd that some of the other golang tooling does: it uses go build in the repository phase to make a hermetic build of our tooling for your local platform.

The main motivations are as follows:

  1. None of us have access to the GCS bucket that rules_docker downloaded from previously.
  2. The CI for updating the puller broke at some point and no one knows how it was configured.
  3. To add a new os (linux, windows, mac) and platform (x86, arm, etc) pair requires us to build and push a new binary in CI.

This is obviously not ideal and was causing the following platforms to not be supported:

  1. M1 Macbooks
  2. Windows

By merging this we will remove the need to download an external binary to use rules_docker in exchange for having a hard dep on rules_go. This is not a perfect win in my book as I'd prefer to not have a dependency on rules_go but right now many parts of the repo already depend on this.

gravypod avatar Aug 06 '21 22:08 gravypod

Hi, just wanted to leave some findings here after we've tried testing this PR out on windows. Unfortunately, we're still unable to get the rule to work on windows:

  • this action doesn't run on windows https://github.com/bazelbuild/rules_docker/blob/42157c239f10cb05b9e2bcdb933ee719f1952dd2/docker/package_managers/download_pkgs.bzl#L115-L121 we get the following error

    Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(202): CreateProcessW("C:\bzl\xf6b35ik\execroot\embark\bazel-out\x64_windows-fastbuild\bin\pkgs.sh"): %1 is not a valid Win32 application.
    

    and needed to patch it to use a bash compatible action to run the generated download_pkgs script.

  • the generated script itself isn't compatible with windows filepaths, the toolchain currently uses which to figure out docker cli path which on windows can return something like C:\Program Files\Docker\... (even after converting backslashes to forward slashes), the download script commands fail with something like

C:/Program: No such file or directory
  • Beyond patching the filepath issue in the script, it also seems the docker daemon itself doesn't like windows filepaths which the client sends to it, at some point we get the following error back from docker
docker: Error response from daemon: the working directory 'C:/Program Files/Git/' is invalid, it needs to be an absolute path.
See 'docker run --help'.

iffyio avatar Aug 19 '21 09:08 iffyio

@iffyio I definitely need to do more testing on windows. This is only a first step towards adding support for other platforms. As a test, could you let me know if you're able to execute the puller that is being built by the repository rules? If we can then that's a big step in the right direction.

gravypod avatar Aug 19 '21 22:08 gravypod

Sorry about the delayed reply @sluongng. My main blocker was balancing this with other work. Right now rules_docker downloads a copy of the puller binary during the repository phase. This is for container_pull()s in your WORKSPACE file. This is done because during the repository phase we do not have access to go_binary() yet (that's loaded in later) and cannot build binaries. This MR does something slightly odd that some of the other golang tooling does: it uses go build in the repository phase to make a hermetic build of our tooling for your local platform.

The main motivations are as follows:

  1. None of us have access to the GCS bucket that rules_docker downloaded from previously.
  2. The CI for updating the puller broke at some point and no one knows how it was configured.
  3. To add a new os (linux, windows, mac) and platform (x86, arm, etc) pair requires us to build and push a new binary in CI.

@gravypod Hey. Yeah these motivations are clear to me now. But I have words of caution:

Many folks in different Open Source projects using Bazel prefer to have a binary dependency instead of having to build the binary to avoid expensive dependencies download. One biggest example would be depending on the kubectl binary: to compile the binary, you would need to download the entire dependency tree of Kubernetes project at that release, which add hundred of MB to the bootstrap process of a project.

So it might be worth clarifying (perhaps in the Change Log once this is merged) whether switching from Binary dependencies to 'compile from scratch' dependencies would add X amount of downloaded data from Y combination of additional Go libs needed to be downloaded and compile.

It would be even better if we can just have a knob somewhere that let people override these binary dependencies with their own 🤔 . So that they can replace it with a binary dependencies if needed, or specify an existing build target in their workspace to have it compiled from scratch. But that does not need to be in the same scope as this PR.

Thanks.

sluongng avatar Aug 21 '21 09:08 sluongng

Many folks in different Open Source projects using Bazel prefer to have a binary dependency instead of having to build the binary to avoid expensive dependencies download. One biggest example would be depending on the kubectl binary: to compile the binary, you would need to download the entire dependency tree of Kubernetes project at that release, which add hundred of MB to the bootstrap process of a project.

@sluongng, that was exactly my approach as well. I made an initial prototype of what it might look like if we did something like that in my personal binaries-in-repo branch. In this branch I set it up so that we could just refer to the precompiled puller binaries as files and choose the correct binary at runtime during the rule's execution. I also made a _test to validate the binaries were up to date in a presubmit.

The other maintainers on rules_docker pointed out these issues:

  1. This is currently ~12mb/(OS+ARCH) and will only grow if we add more functionality.
  2. If we wanted to slim this down we would need to do a rules_docker_{os}_{arch}.zip distribution but this was considered to be too was difficult.
  3. This would grow the git repo size. This wasn't my main concern as I'm pretty used to using git-lfs but it was raised. To mitigate this we'd have to go back to using http_file which is a pretty bad development UX. If you change the puller/pusher there'd be no way to test that code in a CI without pushing the puller to an object store first. This is all possible, just reduces velocity. It's much easier to do ibazel run //precompiled:update_precompiled or something and have any changes automatically built for you.
  4. We'd need to manually curate a list of platforms and OSs to build for. This isn't impossible since there's a finite amount but it is annoying for users of more obscure platforms (I think we had some people who use an s390x using this rule set?)

That's why we're trending this way. If there are easy solutions to these that you can point to I'd much rather adopt them. This approach is pretty hacky.

So it might be worth clarifying (perhaps in the Change Log once this is merged) whether switching from Binary dependencies to 'compile from scratch' dependencies would add X amount of downloaded data from Y combination of additional Go libs needed to be downloaded and compile.

Since we are using golang's vendoring feature there would be no other downloads besides the source in our repository. However, it is important to point out that all of this source code has had a material size increase on our repo:

➜  Downloads du -h rules_docker-build-go-cmds-in-repository-rule.zip rules_docker-master.zip
1.5M	rules_docker-build-go-cmds-in-repository-rule.zip
856K	rules_docker-master.zip

That's a 1.7x increase in repo size. This is, however, much smaller than the binaries we were downloading before:

➜  Downloads wget https://storage.googleapis.com/rules_docker/aad94363e63d31d574cf701df484b3e8b868a96a/puller-linux-amd64
--2021-08-21 09:50:37--  https://storage.googleapis.com/rules_docker/aad94363e63d31d574cf701df484b3e8b868a96a/puller-linux-amd64
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.35.176, 142.251.32.112, 142.250.81.240, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.35.176|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7955988 (7.6M) [application/octet-stream]
Saving to: ‘puller-linux-amd64’

puller-linux-amd64  100%[===================>]   7.59M  43.6MB/s    in 0.2s    

2021-08-21 09:50:38 (43.6 MB/s) - ‘puller-linux-amd64’ saved [7955988/7955988]

So, if we are looking at this from an "absolute change in download size" we are saving ~7 MB and removing network access assuming you use local_repository().

It would be even better if we can just have a knob somewhere that let people override these binary dependencies with their own . So that they can replace it with a binary dependencies if needed, or specify an existing build target in their workspace to have it compiled from scratch. But that does not need to be in the same scope as this PR.

Agreed, this is a good idea. This is also a binary that, technically, not everyone actually needs. I'd like it if rules_docker shifted from making cc_image and go_image and instead made it easy to take a bunch of binaries + files and put them into image layers. Macros like cc_image could then be built on top of this base functionality. We could then use a rules_distroless to fetch all of the packages and create base images hermetically which would side step the need for most users to ever use container_pull. We're a long ways off from this kind of thinking but anything we can do to reduce the feature set of rules_docker, or allow users to opt out and supply their own, would be great. I see rules_docker as rules_pkg with metadata.

gravypod avatar Aug 21 '21 13:08 gravypod

I think it's quite reasonable for us to publish releases to github which consist of the existing .tgz as well as a distribution of the puller binary for each platform. rules_go makes it quite easy for us to build&support all the popular os+arch Similar to how https://github.com/bazelbuild/buildtools/releases/tag/4.0.1 does their pushes.

Then we'd include a bazel repository rule and toolchain in rules_docker that provides the puller for your platform. We have some of these in rules_nodejs that I can describe if that's helpful, for fetching node and now esbuild and cypress binaries.

Reducing the dependencies here isn't just about network size. It's also sometimes a spaghetti project of making all your WORKSPACE deps happy with the rules_go we'd require (and maybe bazel_gazelle to get the go_repository rule?). You might have to upgrade your rules_go, and error messages won't make it clear if that's the case. I'd be happy to set aside some time to help with publishing static-linked puller binaries as part of the release process to make this possible.

alexeagle avatar Aug 23 '21 00:08 alexeagle

The deps need to be vendored for the trick to work. This is the same reason for the vendor directory in bazel-gazelle: https://github.com/bazelbuild/bazel-gazelle/tree/master/vendor.

On Sun, Aug 22, 2021 at 6:38 PM Alex Eagle @.***> wrote:

@.**** commented on this pull request.

is there a reason to vendor the dependencies? I'd much rather use go_repository to fetch them.

I just recently added some Go code in rules_python ( bazelbuild/rules_python#514 https://github.com/bazelbuild/rules_python/pull/514) so the pattern for getting the dependency resolution setup is quite fresh.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_docker/pull/1918#pullrequestreview-735630191, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMLFA6TT2JSLQIC5RQA6TT6GKAXANCNFSM5A4UUN6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

pcj avatar Aug 24 '21 02:08 pcj

@pcj, do you think it would make sense to use gazelle to generate BUILD files for our vendor folder instead of having two ways of specifying deps? This would help make sure that the two build chains never fall out of sync.

gravypod avatar Aug 24 '21 02:08 gravypod

FYI, I tested this on Linux ppc64le to pull an image, it worked fine. Thanks 👍

MatthieuSarter avatar Aug 25 '21 18:08 MatthieuSarter

@iffyio I definitely need to do more testing on windows. This is only a first step towards adding support for other platforms. As a test, could you let me know if you're able to execute the puller that is being built by the repository rules? If we can then that's a big step in the right direction.

Hi @gravypod sorry about the slow response, I can confirm that the puller does in fact work on our repo on windows yes!

iffyio avatar Aug 31 '21 08:08 iffyio

Friendly ping to @alexeagle and @pcj. If we can get some reviews let me know.

gravypod avatar Sep 16 '21 17:09 gravypod

@gravypod I think the code looks good (I didn't see any changes since we paired on it, is there a particular spot I should look at?). I think it is mostly missing documentation.

As for windows, yes that should be tested more thoroughly. I'm not sure if windows is considered a supported platform or not for rules_docker. @philwo is windows a possibility on bazel buildkite? If not it might be worth setting up a github action to test this part of the rules.

pcj avatar Sep 16 '21 21:09 pcj

As a user of these rules, invoking go and trying to fetch/require go externally i'd be hugely -1 on. This brings in a whole other set of dependencies/odd behaviors we would have to care about how they behave on every users machine. If we can use github actions to build these then we get to centralize any pain/distribution and building to supply for the repository rules. Alas bazel just isn't great here about being able to build/bundle things like this :/

As part of a release building per platform binaries and baking it into the release guide/scripts I at least much rather myself when it comes to these repository rules level stuff. Github actions has been great for this, the only thing i've needed they don't have yet is a M1 mac runner, though via rosetta for most tools it really hasn't been a blocker. (Go can probably cross compile too maybe -- I do that for one project thats rust based). e.g. https://github.com/ianoc/bazel-deps/releases/tag/v0.1-46

ianoc avatar May 11 '22 17:05 ianoc

@gravypod would the approach i did in https://github.com/bazelbuild/rules_docker/pull/2086 work for you? github actions can on demand build up the binaries. GIthub have windows runners so it can be extended to windows (and use a windows runner if go doesn't cross compile to windows (?) easily there)

ianoc avatar May 12 '22 03:05 ianoc

My impression is that this is PR unlikely to proceed, because the right answer is to stop having custom binaries at all. There are plenty of tools in the ecosystem that can do this job without rules_docker needing to maintain one.

alexeagle avatar May 12 '22 16:05 alexeagle

Which tools do you mean @alexeagle ? This PR or mine ?

I guess do you mean something to handle distribution or supply of binaries. Or an off the shelf tool we should use the releases of instead ? If so, which tool did you have in mind ?

ianoc avatar May 12 '22 16:05 ianoc

@ianoc I think maybe you're misunderstanding how it works. There is no need to fetch go externally, the toolchain is already there via rules_go, which is an absolute prerequisite for rules_docker. This is exactly the same way gazelle is built for the go_repository rule.

@alexeagle that is probably out of scope for this PR thread.

pcj avatar May 13 '22 00:05 pcj

I'll rebase and freshen this branch up.

pcj avatar May 13 '22 00:05 pcj

@pcj sorry your right -- though building in the repository rules means really confusing errors on failures, and bazel can't cache this compile right? -- any reason your against just having github actions just build these binaries ? (This is obviously an absolutely massive diff of huge amounts of vendered go code).

I myself would love if we removed the go dependency from these rules. Getting go + rules_docker + bazel + mac to behave nicely onupdates and interacting with other repos has been a pain. We could just distribute all the binaries for the rules and make a slimmer dependency tree for users?

ianoc avatar May 13 '22 00:05 ianoc

@ianoc A few comments:

  • Removing golang from rules_docker is IMHO out-of-scope, the effort is too large, without a clear reason. If you really want containers with bazel and not go, I'd suggest starting a new repo.
  • The diff here is actually not big. Ignore LOC from vendored code, it's a useless metric.
  • The puller is used in a repository rule. As such, the puller binary would be part of the repository cache, not the action cache. But yes, it's cached.
  • I'm not sure what you mean by confusing errors (all bazel errors are confusing :) ). Is there a more specific concern?

pcj avatar May 13 '22 01:05 pcj

  • Could you speak to why it is large? the pattern that is in the repo now would support it easily no? Just means adding a few current binaries to the same approach? The clear reason is that we don't (and i assume plenty of others don't) use go, but even when we have to for dependencies such as this we have to chase down compatible matrix of os/go/bazel/rules_docker combinations. Which seems unnecessary for a system such as rules_docker to have.
  • i'm not sure i'd agree its useless, its another set of dependencies and concerns. And seems unnecessary
  • Unless i'm mistaken thats not how the repository cache works. Downloads are kept in that cache, actions taken by repository rules are not hermetic and are not cached by bazel. They are just on disk state. Changing branches back and forth for instance would blow them away with no caching
  • I mean we have a different way to invoke go than using rules go. If a user gets an error they have to care about custom code paths here, on every possible location its used. If anything is setup/configured to occur on the command line, if things that are on disk and effect commands run/compiles because they aren't sandboxed. All possible sources of issues.

ianoc avatar May 13 '22 01:05 ianoc

To restate the problem:

Note @ianoc this is not a response to your last comments, I was writing it independent of that.

  • The puller is used by pull.bzl container_pull, which is a repository rule. That means that the binary must be available by the time a container_pull rule is called by bazel, which is before any bazel actions would run, preventing one from building the puller via something like go_binary.
  • Historically, cloudbuild was used to construct new pullers, and a rules_docker maintainer would have to go and update the source of the repositories/repositories.bzl file and update all the storage urls and hashes. Not fun. Since cloudbuild for rules_docker was discontinued a long time ago, the puller hasn't been updated in years. I doubt anyone can remember or knows what exactly the version of the puller we are actually using is.
  • One can build artifacts via github actions, or any CI service, that's not the hard part. But you still have to periodically go and update the source repo, or invent some bot to do that? Extra work.
  • Also, every time someone wants new architecture, it has to be explictly provided (e.g. https://github.com/bazelbuild/rules_docker/issues/1790)

Gazelle has the same issue as rules_docker in that it both houses the source code, yet references it in a repository rule. For that reason, @jayconrod came up with this solution, which I think is positively brilliant. You just use the go tool from @go_sdk and basically just call go build inside a repository rule, setting up symlinks such that the source is visible to the compiler, and the output binary goes into the correct place, which in the end (in this case) is @rules_docker_repository_tools//:bin/puller.

So, this PR makes is possible to support all architectures, without any extra work from maintainers, and have zero dependencies on external artifacts. I think that's a huge win.

pcj avatar May 13 '22 01:05 pcj

I think it is a real bummer to say that making a set of rules closer to hermetic is out of scope. You may think it is fine to require go, the next set wants jq on the path, the next assumes something else. Any single one isn't a giant deal, but as I thought was an agreed upon principle in the bazel community: the existence of each of them adds a tax when composing rules.

johnynek avatar May 13 '22 02:05 johnynek

@johnynek nice to hear from you, it's been a while since I sat next to you at bazelcon like 4 or 5 years ago! It is a misunderstanding that this PR compromises hermiticity. The compiler tool will have the same hash as the user requested @go_sdk//:bin/go, the source is at a known commit, and the deps are vendored. I don't see how jq is relevant. This isn't about using random tools expected to exist in the user's workstation PATH.

pcj avatar May 13 '22 03:05 pcj

It's might be interesting to point out how go_repository is currently working. A go_repository target would go through the following steps during Bazel loading phase:

  1. Download archive (via different channels)
  2. gazelle is run over downloaded archive
  3. Patches are applied (if there are any)

For these to happen at Bazel loading phase, go_repository depends on 2 go binaries: fetch_repo and gazelle.

So for a very long time, these 2 binaries have been prepared like this https://github.com/bazelbuild/bazel-gazelle/blob/1f6da0ab55f094c2a33e21292f5e2db460b5d857/internal/go_repository_tools.bzl#L87-L101 where we invoked go install against current archive of @bazel_gazelle being setup in a mock GOPATH.

This PR essentially use that very same approach to prepare the go binaries that rules_docker needed.

The problems that we are trying to solve with this PR are:

  1. How to keep the binaries relatively up-to-date compare to the rest of the repository
  2. While keeping the cost of maintenance over rules_docker project relatively manageable

With that said, I think there are sufficient community concerns regarding this PR that I would suggest we slow down how this is adopted:

  1. Let's merge the code that would help compile the go binaries without using it. We continue to use the old pre-compile binaries after this PR was merged
  2. In a separate PR, let's introduce a flag use_precompiled_binaries that let's people optionally opt-out to compiling the binaries as part of rules_docker. This flag is default to True, effectively does not affect anybody for now.
  3. Prepare communication for a major release where we flip the default value of the flag. (Pinned issue, headline in README.md, announce in release notes etc...)
  4. Users who do not wish to opt-in to this after the major release should still have the option to bring their own pre-compiled binaries and used it. Ofc they will have to manage feature compatibility as well as updating the pre-compiled binaries themselves.

@gravypod @pcj wdyt?

sluongng avatar May 13 '22 07:05 sluongng

To clarify a bit more on my suggestion:

I think this PR is changing a behavior that many users have been relying on. It's doing a bit too many things at once: introducing new code, new feature, new dependencies and removing old feature.

While that might be totally fine in an organization which is used to moving fast and ok with disruptive changes, in the opensource world there are less appetite for such move. So I have a feeling where if we can deliver this feature slower, in smaller incremental steps, with enough support for backward compatible during the transition period, it would be a much easier pill to swallow.

sluongng avatar May 13 '22 07:05 sluongng

@sluongng Thats great context thank you, I hadn't realized the go rules do this inside themselves. I would still have reservations that they are the location controlling the rules(+ go version itself) and thus generally some more leeway should be given for horrible hacks for self-bootstrapping. However in general i think your general compromise sounds pretty good.

Another aspect, I did open a PR which I believe would solve this issue without having this approach with building go. Its a much smaller diff to add the capabilities to GH Actions to build these binaries. And the template would let us move the other direction and eliminate the go dependency of the rules if we wished in future. Both have a synchronization challenge, this with vendoring of dependencies (and thus if i'm not mistaken meaning we have 2 sets of dependencies these tools can be built with, what bazel would do it, and what the vendored copies are?), and the other that we have to trigger a separate release to have github build all the binaries and then update the repositories file. Both approaches avoid the historic issues with the current binaries i believe around updating + forking.

ianoc avatar May 13 '22 15:05 ianoc

@sluongng Yes thanks for making it more clear how go_repository rule works. Note this is also how the proto_repository works (https://github.com/stackb/rules_proto/blob/master/rules/private/proto_repository_tools.bzl).

I'd say if we really want to feature-flag it, let's just add the flag now so it can be tested by the community. I'm not sure this is a "feature" though, all we are doing is changing the binary used by container_pull and container_load; the behavior is the same. If anyone would like to try it now, you can test it via https://github.com/pcj/rules_docker_1918.

We can also add a gazelle update-repos rule and make the go.mod file the source of truth, another small improvement that is long overdue.

pcj avatar May 13 '22 16:05 pcj