rules_rust icon indicating copy to clipboard operation
rules_rust copied to clipboard

crate_universe: Rewrite generator in starlark

Open uhthomas opened this issue 2 years ago • 10 comments

This is just an idea, I'd really appreciate some feedback and ideas.

Is it conceivably possible to rewrite the generator in pure starlark? This would remove the bootstrapping problem of distributing generator binaries or hackily building the generator with the native cargo toolchain.

https://github.com/bazelbuild/rules_rust/blob/59fab4e79f62bfa13551ac851a40696c24c0c3a4/crate_universe/deps_bootstrap.bzl#L9

We're using a custom generator to work around some issues, but this makes the analysis phase almost a minute slower as repository rules are not remotely cacheable.

Finished release [optimized] target(s) in 38.77s

Failing a full rewrite, some more ideas:

  • Rewrite only some functionality in starlark.
  • Move some functionality out of the repository rule and into just a normal rule. This could be achieved with a macro. The normal rule could then check if repinning is required, whilst allowing the repository rule to generate the build files it needs.

Thanks for taking the time to read this :) Please don't hesitate to ask any questions.

uhthomas avatar Apr 22 '22 18:04 uhthomas

I'm not entirely sure the generator can be entirely written in starlark but the renderer side of it very likely could be and think that'd be great!

We're using a custom generator to work around some issues, but this makes the analysis phase almost a minute slower as repository rules are not remotely cacheable.

Finished release [optimized] target(s) in 38.77s

To clarify, are you building a new generator in repository rules? crate_universe as never designed for users to be compiling the generator in repository rules. Instead, the expectation is that users who want to compile their own binaries can do so and upload them to some static asset host (s3, artifactory) where devs and CI would pull them down for use in their builds.

That said though, if the renderer were to be implemented in starlark, then there would be no need to download or build a generator in most cases since if you have a trusted lockfile, the only computation left to do is write BUILD files from pre-defined templates which is something doable in most lanugages.

UebelAndre avatar Apr 27 '22 14:04 UebelAndre

Sounds great, thanks! I have some pretty good ideas on how to tackle this and I think the implementation could be quite nice.

I was thinking about the generator a bit and don't see why that couldn't also happen in Starlark. The cargo binary will happy output JSON which makes it easy to work with.

uhthomas avatar May 20 '22 13:05 uhthomas

I've made a prototype implementation and it looks pretty good so far. It has shown some weirdness with the format of the lockfile, but has been easy to work with otherwise.

The blocker at the moment is figuring out how to handle selects and cfg expressions (conditional compilation, syntax reference). Example:

cfg(target_family = "unix")

The lockfile isn't quite as data-oriented as it initially seems. It expects tools which work with the lockfile to be able to parse and evaluate these expressions.

Does it seem reasonable to suggest a slight change to the lockfile? I don't see any drawbacks for an explicit list of targets as opposed to a cfg expression.

@UebelAndre

uhthomas avatar Jun 12 '22 14:06 uhthomas

Does it seem reasonable to suggest a slight change to the lockfile? I don't see any drawbacks for an explicit list of targets as opposed to a cfg expression.

Can you provide an example of the specific change you're thinking of?

UebelAndre avatar Jun 12 '22 14:06 UebelAndre

Yep, of course.

So here's an excerpt from the JSON lockfile (represented as CUE for brevity):

"sha2 0.10.2": common_attrs: deps: selects: {
	"cfg(any(target_arch = \"aarch64\", target_arch = \"x86_64\", target_arch = \"x86\"))": [{
		id:     "cpufeatures 0.2.2"
		target: "cpufeatures"
	}]
}

Currently the "selects" deps are a map of cfg expressions to deps. I would propose that instead it should be a list of objects which include a list of platform names and deps. Such:

"sha2 0.10.2": common_attrs: deps: selects: [{
	platforms: [
		"aarch64-apple-darwin",
		"aarch64-apple-ios",
		"aarch64-linux-android",
		"aarch64-unknown-linux-gnu",
		"i686-apple-darwin",
		"i686-linux-android",
		"i686-pc-windows-msvc",
		"i686-unknown-freebsd",
		"i686-unknown-linux-gnu",
		"x86_64-apple-darwin",
		"x86_64-apple-ios",
		"x86_64-linux-android",
		"x86_64-pc-windows-msvc",
		"x86_64-unknown-freebsd",
		"x86_64-unknown-linux-gnu",
	]
	deps: [{
		id:     "cpufeatures 0.2.2"
		target: "cpufeatures"
	}]
}

uhthomas avatar Jun 12 '22 15:06 uhthomas

That'd probably be fine but seems like it'd be noisy. The reason I wrote the cfg string is because it was shorter and slightly more meaningful. The lockfile includes a mapping of those strings to a list of platforms so they only need to be defined once. Would it still work to use that?

"conditions": {
    "cfg(any(target_arch = \"aarch64\", target_arch = \"x86_64\", target_arch = \"x86\"))": [
        "aarch64-apple-darwin",
        "aarch64-apple-ios",
        "aarch64-linux-android",
        "aarch64-unknown-linux-gnu",
        "i686-apple-darwin",
        "i686-linux-android",
        "i686-pc-windows-msvc",
        "i686-unknown-freebsd",
        "i686-unknown-linux-gnu",
        "x86_64-apple-darwin",
        "x86_64-apple-ios",
        "x86_64-linux-android",
        "x86_64-pc-windows-msvc",
        "x86_64-unknown-freebsd",
        "x86_64-unknown-linux-gnu"
    ]
}

UebelAndre avatar Jun 12 '22 15:06 UebelAndre

I completely overlooked the conditions in the lockfile, thanks! I think that will work fine.

Is there a reference which explains the format of the lockfile? Something which isn't source code? Might be helpful for future readers.


re: it'd be noisy

I agree, though the lockfile is already very large.

❯ cat Cargo.Bazel.lock | wc -l
10719

uhthomas avatar Jun 12 '22 16:06 uhthomas

So, I got it all working! :)

Just need to clean some stuff up. Hopefully will open a PR for this within the next few days.

uhthomas avatar Jun 12 '22 22:06 uhthomas

If you're interested, this is the dev branch: https://github.com/bazelbuild/rules_rust/compare/main...uhthomas:1287.

There are a few more things to do, but it's possible to build and run the cargo-bazel binary with these changes which is a huge milestone.

❯ bazel run @rules_rust//crate_universe:bin
(11:08:42) INFO: Invocation ID: 94c7a23a-a6fa-4f96-8d8a-3639c3d5fa1a
(11:08:42) INFO: Current date is 2022-06-13
(11:08:42) INFO: Analyzed target @rules_rust//crate_universe:bin (0 packages loaded, 0 targets configured).
(11:08:42) INFO: Found 1 target...
Target @rules_rust//crate_universe:cargo_bazel_bin up-to-date:
  bazel-bin/external/rules_rust/crate_universe/cargo_bazel_bin
(11:08:42) INFO: Elapsed time: 0.226s, Critical Path: 0.01s
(11:08:42) INFO: 1 process: 1 internal.
(11:08:42) INFO: Build completed successfully, 1 total action
(11:08:42) INFO: Running command line: bazel-bin/external/rules_rust/crate_universe/cargo_bazel_b(11:08:42) INFO: Build completed successfully, 1 total action
cargo-bazel 0.2.2

USAGE:
    cargo_bazel_bin <SUBCOMMAND>

OPTIONS:
    -h, --help       Print help information
    -V, --version    Print version information

SUBCOMMANDS:
    generate    Generate Bazel Build files from a Cargo manifest
    help        Print this message or the help of the given subcommand(s)
    query       Query workspace info to determine whether or not a repin is needed
    splice      Splice together disjoint Cargo and Bazel info into a single Cargo workspace
                    manifest
    vendor      Vendor BUILD files to the workspace with either repository definitions or `cargo
                    vendor` generated sources

uhthomas avatar Jun 13 '22 10:06 uhthomas

@UebelAndre Would you mind checking out my branch and trying it out? It's not feature complete, though rules_rust builds with it (bazel builld //...) and we're also now using it internally. It seems to work well and bazel run @crate_universe_crate_index//:repin is blissful in comparison to the bazel sync stuff.

The vendoring of BUILD files had to be removed, apologies if this is a problem. The way the manifest is interpreted by starlark makes it infeasible.

However bazel query --output=build @crate_universe_crate_index//... or bazel query --output=build @crate_universe_crate_index__toml-0.5.9//... does show the expanded macros.

Looking forward to hearing your experience!

uhthomas avatar Jun 17 '22 16:06 uhthomas