Patching support for crate universe local vendoring
On Fuchsia we have an existing Cargo.toml that we use for cargo-vendor with a couple hundred crates. For dozens of those crates we patch out the dependencies to point to forks elsewhere in our tree:
[patch.crates-io]
fuse3 = { path = "forks/fuse3" }
We're trying move to using crate universe to manage the vendoring, and #1645 (which this PR is rebased ontop of) is enough to make cargo-bazel not choke when it sees those path dependencies. At that point it emits a spliced Cargo.toml which still contains the patch.crates-io and hands it off to cargo vendor, and that correctly doesn't download any of the crates that are listed as patched.
Unfortunately BUILD file generation was still broken. In this example with 3 crates where b is patched out, cargo-bazel still creates a vendor/b/BUILD.bazel that contains a sources glob for **/*.rs but the sources are over in the patch location:
.
├── BUILD.bazel
├── Cargo.toml
├── my_forks
│ └── b
│ ├── Cargo.toml
│ └── src
└── vendor
├── a
│ ├── BUILD.bazel
│ ├── Cargo.toml
│ └── src
├── b
│ └── BUILD.bazel
└── c
├── BUILD.bazel
├── Cargo.toml
└── src
To fix the issue of sources (as well as crate root and compile data) pointing to nonsensical paths for the generated BUILD file, we tried to make the minimally intrusive change of letting those fields accept either a label or glob (rather than hardcoding the glob) and then creating filegroups in the patched crate that can be pointed to by their label from the generated file. Currently the code prints out a generated BUILD file for each patched crate, but the only difference for different patched crates would be if they had different source roots, so we'd be happy to simply document the filegroups that are required for patching to work and let the user handle copying a template into each of their patches. It would be possible for cargo-bazel to write out those BUILD files to the right locations itself, but we assumed this behavior was a bit too invasive and should at least not be enabled by default. Here's an example of the generated BUILD file:
filegroup(
name = "crate_root",
srcs = ["src/lib.rs"],
visibility = ["//third_party/vendor:__subpackages__"],
)
filegroup(
name = "srcs",
srcs = glob(["**/*.rs"]),
visibility = ["//third_party/vendor:__subpackages__"],
)
filegroup(
name = "compile_data",
srcs = glob(
include = ["**"],
exclude = [
"**/* *",
"BUILD",
"BUILD.bazel",
"WORKSPACE",
"WORKSPACE.bazel",
],
),
visibility = ["//third_party/vendor:__subpackages__"],
)
@illicitonion and/or @UebelAndre I'd love your thoughts on the approach here. An alternative we were exploring was to create a third vendoring "mode" called "local_repository" which still ran cargo vendor like the local mode, but defined each crate as its own repository like the remote mode. Then we could achieve patching by overriding all those maybe repositories with new_local_repository for each patched crate in our WORKSPACE.bazel. That required more extensive changes to cargo-bazel internals compared to this PR and we'd like to make patching work for existing users of the local mode without adding a different mode. Patching support is our last blocker to landing rules_rust usage in Fuchsia, so we wanted to make this change as uncontroversial as possible.
Before I'd consider this PR mergable, we should remove the println!s for patched crates and add a section about patching to the crate universe docs. Ideally we should add some testing but I'm not sure whether it's feasible to write e2e tests that fetch some deps from crates io and use some local patches and make sure that a crate that depends on both of those still compiles. That's what I've been doing locally to test this change.
Fixes #1631
Thanks for putting this together! Could you add a new example workspace to https://github.com/bazelbuild/rules_rust/tree/main/examples/crate_universe to illustrate what this concretely looks like in practice? It'd help guide reviewing :)
@illicitonion I've added an example and docs, hopefully they give a better sense of how exactly the patching behavior works. I still need to add another patched crate to the example that exercises the build_script functionality, but it will look pretty much the same as the patched rand_core.
@UebelAndre I think using the example as an integration test is the most valuable form of testing. Do you think it's worth adding a unit test which ensures that the Rule produced by collect_targets has filegroups rather than globs for its sources/crate_root/compile_data? That feels a bit too tied to the implementation details to me, and I don't think we can easily check in a unit test that cargo vendor doesn't actually download the sources for patched crates.
@UebelAndre sorry to revive this old PR but Fuchsia is trying again to start using bazel for rust and not being able to patch deps is still blocking us. I've rebased the changes, and the vendor_local_patching example README is a good representation of how we want to use this feature if you need a refresher.
Last time this comment was the blocking concern. On Fuchsia we use Cargo.tomls in our patched crates as the source of truth for how to build them. It's understandable that you'd want fully buildable bazel targets in those patched crates rather than just some exposed filegroups for the sources, but would you be ok with cargo_bazel still spitting out those generated BUILD files so we could copy them into the patch dirs instead of having to hand maintain BUILD files alongside cargo manifests?
I think we agreed last time around that it'd be unexpected/not advisable to have cargo_bazel stick files in directories outside your vendor dir, so I'd be happy to make this output them there with a note at the top that they need to be moved to your patch if you want to use it (instead of handwriting a BUILD file yourself), and then I'll use aliases like you suggested to "re-export" the patched crate from the crates_vendor BUILD file, which should make the other 3p crates that depend on patched ones work.
Let me know if there have been any big changes that impact this approach, or if you'd rather me just close this and make a fresh PR.
We've opted to clean this up and re-open it as a separate PR: #3071. Many of the original comments were stale after lots of rebasing, and the main point of discussion has been linked to and summarized from that new PR.