bazel-central-registry icon indicating copy to clipboard operation
bazel-central-registry copied to clipboard

[Idea] Support Starlark overlays without patches

Open aaronmondal opened this issue 1 year ago • 10 comments

At the moment all modules are overlaid with patches and those patches are hashed and tracked in source.json.

Many modules probably just need a MODULE.bazel file and some buildfiles. I think it would be nice UX improvement if those files could be put into the BCR directly and have some mechanism to "automatically" overlay these Bazel-specific files.

Since we already have MODULE.bazel in the BCR without patches, it might be viable to have some sort of overlay mechanism, maybe even an overlays directory that adds Starlark files directly on top of the external repository.

Such an approach would be somewhat similar to how it feels like to work in nixpkgs: Have the build overlays as regular source code and only use patches for things that are "true" functionality-changing patches.

I believe this could make it nicer to browse the BCR and get inspiration. It would also make it easier to spot functional changes and would potentially reduce the friction of having to generate patch files all the time.

aaronmondal avatar Mar 01 '24 21:03 aaronmondal

+1 it's really painful to have to encode new files as patches

alexeagle avatar Mar 29 '24 19:03 alexeagle

We could add a remote_files attribute to http_archive that takes a dict from repo-relative paths to URLs and downloads the files from these URLs to the given path (for simplicity, assuming that the original file didn't exist). We would need an additional remote_files_integrity dict to store the hashes.

Then registries could start populating this attribute in their source.json files, but they would only be compatible with Bazel versions with support for the new attributes. I need to check what an old Bazel version would do with such a source.json - hopefully it would fail rather than silently dropping the attributes. But we could always have a bazel_compatibility requirement for such modules.

@meteorcloudy @Wyverald @SalmaSamy What do you think?

fmeum avatar Apr 25 '24 08:04 fmeum

SGTM, PR welcome. And we could backport the change to 6.x and 7.x.

meteorcloudy avatar Apr 25 '24 12:04 meteorcloudy

I am interested in working on this. I think it would be a big QoL improvement.

@fmeum if could point me to some source files or maybe a PR that did something similar in changes to sources.json to mimic I can try that.

fzakaria avatar Apr 25 '24 17:04 fzakaria

I want to add not only is it annoying to submit patch files but the BUILD files aren't searchable on GitHub search using the starlark language filter. That would help expose more samples.

fzakaria avatar Apr 25 '24 17:04 fzakaria

Okay here is my research:

  1. Augment https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/repo/http.bzl#L235 to add a new attribute for an overlay directory (auto) or the dict files (manual)
  2. Augment https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/bzlmod/ArchiveRepoSpecBuilder.java#L28
  3. Augment https://github.com/bazelbuild/bazel/blob/618c0abbfe518f4e29de523a2e63ca9179050e94/src/main/java/com/google/devtools/build/lib/bazel/bzlmod/IndexRegistry.java#L189 to handle support for the new fields in the source.json

Looks like (1) can be starlark only. (2) and (3) need to be changed for bzlmod but I'm not 100% certain of their innerworkings.

fzakaria avatar Apr 25 '24 17:04 fzakaria

@fzakaria That's highly appreciated! Your research looks solid, most of this can indeed be done in Starlark and I can also help with the Java parts.

For starters, I would suggest to focus on the Starlark part and augment http_archive with two new attributes:

  • remote_file_urls of type string_list_dict, mapping a relative path of a file to its source URLs, thus supporting mirrors. Remote patches currently don't support mirrors. @meteorcloudy Was that intentional?
  • remote_file_integrity of type string_dict, mapping the same relative path to the integrity value.

I think that attribute structure is going to be simpler than a full-blown overlay JSON, with a very similar experience: In the BCR, we could just have a files sibling directory of patches with files neatly laid out according to their relative path. The usual tooling (//tools:update_integrity in the BCR) could then add the integrity values to source.json in whatever format we want.

When implementing the download logic for the remote files, I would suggest to keep the following in mind:

  • Downloads of multiple files can be parallelized via the block parameter of download. This can't be cherry-picked into Bazel 6 though.
  • Before downloading the files, we should make sure that their relative paths can't result in path traversal outside the repository directory. We should forbid .. segments and also verify that the absolute path obtained by resolving the relative one against the repository directory still lies under the repository directory. As far as I know download_and_extract can't create symlinks to absolute paths that are directories, but we should also doublecheck that this doesn't allow malicious archives to place files anywhere on the system.

fmeum avatar Apr 26 '24 10:04 fmeum

@fmeum thank you for the advice -- I was able to start https://github.com/bazelbuild/bazel/pull/22155 It is working. I finished 90% -- I can add remaining tests and finish it but I wanted some early feedback.

fzakaria avatar Apr 26 '24 18:04 fzakaria

@fmeum sent me next steps I am capturing here:

You will have to extend this file: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/bazel/bzlmod/IndexRegistry.java;l=131;drc=a87a8db23885535232ace3f64ac45e595c95d2f5

In particular the getRepoSpec method. You can mostly mimic the logic for patches, I think.

Once that lands, we can update the tooling in the bazel-central-registry repo.

fzakaria avatar May 11 '24 15:05 fzakaria

While @fzakaria is close to getting overlay support into Bazel, I noticed something that could make our lives easier today: It's possible to reuse patches from other module versions, which could be used to simplify reviews of C++ modules. See https://github.com/bazelbuild/bazel-central-registry/pull/2024/files#diff-90fa618704d4146385f96570f2033f70b09f59ae5c630825ea0dde357520ebd5R5 for an example.

fmeum avatar May 13 '24 20:05 fmeum

Final piece is up for PR #2046 2046

fzakaria avatar May 16 '24 21:05 fzakaria

Thanks to all who helped make this possible! <3

Wyverald avatar Jul 23 '24 23:07 Wyverald