rules_rust icon indicating copy to clipboard operation
rules_rust copied to clipboard

Persistent worker for Rust rules

Open nikhilm opened this issue 5 years ago • 13 comments

Hello,

I have a simple persistent worker for Bazel that I've been playing with on some small projects.

Would the Rust rules be interested in integrating this? Since rustc does not support Java/TypeScript/Scala like "Compiler as an API", this really only takes advantage of in-crate incremental compilation introduced in rust 1.24. This is achieved by passing --codegen=incremental=/path/to/dir. The directory is named using a combination of hash(path to rustc wrapper) and the workspace name, with the goal of:

  1. Being shared across multiple worker instances when they are spawned by Bazel. This is why one cannot use something like the tempdir crate (would be deleted each time a worker shut down) or just a randomly named temp file (would not be shared across worker processes).
  2. Not being shared across workspaces.
  3. Not being shared across compiler updates. This should probably actually depend on the path to rustc itself, instead of the wrapper. I can fix that.

The questions I have are:

  1. Is this kind of cache sharing sound design? I don't know enough of Bazel internals nor rustc internals to have high confidence in answering yes to this.
  2. Since this project is itself in Rust, how would something like this be integrated into the rules? Options:
    1. Use pre-built binaries built with Cargo. (What I'm doing now for simplicity.)
    2. Re-write in C++.
    3. Have certain internal switches to the Rust rules to build the worker itself using the rustc wrapper directly.

If this all sounds reasonable, I can send relevant PRs. In particular, it would also be nice to have the worker and the current process wrapper combine their functionality so that an additional process spawn is saved (Go from worker -> process wrapper -> rustc to worker -> rustc.

nikhilm avatar Sep 20 '20 23:09 nikhilm

If we are to follow the behavior of Cargo exactly, we would also want to include the compilation mode in the incremental cache path (to reflect the Cargo default of target/{debug,release}/incremental.

nikhilm avatar Sep 20 '20 23:09 nikhilm

That sounds like a great option to offer better build speed.

damienmg avatar Sep 21 '20 14:09 damienmg

Is this kind of cache sharing sound design? I don't know enough of Bazel internals nor rustc internals to have high confidence in answering yes to this.

I think 'sound' is a pretty low bar, depending on what you mean. Reproducibility might be a good measure.

https://doc.rust-lang.org/rustc/codegen-options/index.html#incremental is not very specific, I assume that directory is per-crate? I don't see why avoiding sharing across workspaces is relevant, since the cache should be f(crate, rustc, crate-deps) which could overlap in different workspaces.

Since this project is itself in Rust, how would something like this be integrated into the rules? Options: Use pre-built binaries built with Cargo. (What I'm doing now for simplicity.) Re-write in C++. Have certain internal switches to the Rust rules to build the worker itself using the rustc wrapper directly.

I think that bootstrapping is fairly straight forward by making using the worker (internally?) optional, and disabling it when building the worker and it's dependents.


Last time I looked, there were still reproducibility issues w/ rustc builds, and I doubt incremental compilation helps. In the worst case, if incremental building reduces reproducibility, it is possible that output changes trigger more downstream re-building and incremental builds ultimately don't save any time.

I suspect that right now most rustc builds are going to rebuild all dependents anyways (though again, I have not looked into this recently), so it's very likely this is an improvement.

I also vaguely recall there being a performance trade-off to using --codegen-units, and have seen some places set this to 0.

Also, https://github.com/bazelbuild/rules_rust/issues/228 is tangentially related as another use-case for a worker.

mfarrugi avatar Sep 21 '20 23:09 mfarrugi

https://doc.rust-lang.org/rustc/codegen-options/index.html#incremental is not very specific, I assume that directory is per-crate? I don't see why avoiding sharing across workspaces is relevant, since the cache should be f(crate, rustc, crate-deps) which could overlap in different workspaces.

It also probably depends on the compilation mode, since Cargo tends to keep different dirs - target/{debug,release}/incremental. Is there a way to retrieve something similar to the crate-deps hash in Bazel, or should I simply hash all the deps?

I'll try the bootstrapping option and send a PR. Would you prefer 2 different PRs, one that simply added the worker code and another that actually enabled the build + integration?

Certainly even sandboxed rustc does not produce reproducible builds right now (tested on Linux and I have previous experiences on Windows), so I don't think this is a net negative.

Thanks for the inputs!

nikhilm avatar Sep 22 '20 03:09 nikhilm

Is there a way to retrieve something similar to the crate-deps hash in Bazel, or should I simply hash all the deps?

I don't need to handle this as rustc already takes care of the unique hash derivation after it is passed an incremental directory. It is quite likely that the implementation also takes care of capturing things like build flags, as it seems to propagate dependency information down the tree, so I've chosen to only combine the rustc path and the compilation mode.

nikhilm avatar Sep 22 '20 04:09 nikhilm

@mfarrugi Is there a better way to implement the following than what I'm thinking?

I think that bootstrapping is fairly straight forward by making using the worker (internally?) optional, and disabling it when building the worker and it's dependents.

Specifically, I'm having trouble with "and it's dependents". Since those are raze generated and use rust_library, I can't create my own rust_library_no_worker internal rule. The problem is, even to specify the worker executable to use as an _worker_executable property, I need an attrs that is completely different for my worker build rules and dependencies. Is there some way to control the behavior of dependencies?

nikhilm avatar Sep 23 '20 15:09 nikhilm

This probably requires an "internal only" attribute that raze could learn about, or editing the generated BUILD files. If we had no dependencies, this would be trivial, right? ie. we add a flag that is disable for just the worker binary.

If there is some other cleaner mechanism we can use for bootstrapping, I am not aware of it.

On Wed, Sep 23, 2020 at 11:05 AM Nikhil Marathe [email protected] wrote:

@mfarrugi https://github.com/mfarrugi Is there a better way to implement the following than what I'm thinking?

I think that bootstrapping is fairly straight forward by making using the worker (internally?) optional, and disabling it when building the worker and it's dependents.

Specifically, I'm having trouble with "and it's dependents". Since those are raze generated and use rust_library, I can't create my own rust_library_no_worker internal rule. The problem is, even to specify the worker executable to use as an _worker_executable property, I need an attrs that is completely different for my worker build rules and dependencies. Is there some way to control the behavior of dependencies?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_rust/issues/412#issuecomment-697505154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAROG6DQI4WEOSVMM7ACLFDSHIFCLANCNFSM4RT5G2DQ .

mfarrugi avatar Sep 23 '20 18:09 mfarrugi

Yes, this is certainly hampered by the need to build dependencies. :( I may have to just rewrite the relevant bits in C++ or something. Let me look into how easy it is to tweak raze. Editing the generated BUILD files sounds ugly.

nikhilm avatar Sep 24 '20 04:09 nikhilm

What is the minimum Bazel version these rules need to support?

nikhilm avatar Sep 24 '20 05:09 nikhilm

I'm not sure, at some pointed I tagged minimum-bazel-22 but I haven't been active for quite a while.

If editing the BUILD files works, that is a great place to start. Ostensibly the dependencies shouldn't change too much, and then it gives us a clear picture of what raze needs to learn to do.

On Thu, Sep 24, 2020 at 1:01 AM Nikhil Marathe [email protected] wrote:

What is the minimum Bazel version these rules need to support?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bazelbuild/rules_rust/issues/412#issuecomment-698113322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAROG6FUXLGYLZC3XB3LPITSHLHDNANCNFSM4RT5G2DQ .

mfarrugi avatar Sep 24 '20 05:09 mfarrugi

Sorry, I've been tending to real life for a while. I poked around at using build settings and transitions last night, including label build settings, hoping to bootstrap the worker, but hit a dead end. I'm going to try editing the BUILD files now and then file a raze issue.

nikhilm avatar Oct 03 '20 15:10 nikhilm

I've been out of town since mid December and will continue to be so for another 3 weeks. I don't have access to my development machine until then. I'll check back on some of this once I'm back.

nikhilm avatar Jan 14 '21 06:01 nikhilm

Minor changes were required to get this going with the latest rules_rust - they're available here if anyone needs them.

dae avatar Mar 27 '21 09:03 dae