Support a first-class format for declaring external dependencies
This is an issue extracted from the discussion on https://github.com/rust-lang/rust-roadmap/issues/12. The high level idea is that Cargo should support a first-class method of declaring dependencies on external artifacts in a structured format. When combined with https://github.com/rust-lang/cargo/issues/3815 this would easily allow external build systems to resolve these dependencies to internal rules known by those build system. For example Buck/Bazel may have their own copy of OpenSSL compiled, and the openssl-sys crate should be connected to that copy (both literally at compile time but also in the dependency graph).
The purpose of this support is to allow the majority of build scripts in the ecosystem to largely be overwritten and avoided at compile time. Build scripts tend to be difficult to wrangle in restrictive build systems as they can have an unpredictable set of inputs (for an arbitrary build script) and are otherwise difficult to always audit one by one (for any particular build script). By having a first class description of what the build script would otherwise do this can allow external build systems to assume by default that a build script need not be run.
Note that there's some prior work here to draw from as well:
-
Cargo supports overriding build scripts, preventing their execution. This feature is somewhat underdeveloped though as it has yet to see much adoption in the community. It may be a good starting point though!
-
Crates like metadeps support structed configuration in
Cargo.tomlread at build time (for the build script). In this case specifically for calling pkg-config and I believe integrating with distro builds.
I think completely ignoring the build.rs script when metadata is present would be suboptimal, as it is then an all or nothing choice. Some crates may be able to specify 99% of their deps as structured deps but still need a custom build.rs script for some remaining, not yet supported option.
I like the solution of metadeps much more, where the build.rs script still gets run but behaves very predictable, so for example a package manager can parse the pkg-config requirements and then be sure that build.rs will run sucessfully.
Ignoring build.rs is like "jailbreaking" a dependency version bound to force a disallowed version -- useful sledgehammer but best used as backup.
The metadeps approach is nice for modularity, If such libraries could work both at build time (build.rs) and plan-extraction-time (https://github.com/rust-lang/cargo/issues/3815), that would be especially cool.
As one possibility: I'd love to see a standardized mechanism to extend the build metadata without writing an explicit build.rs, in a way that allows multiple such extensions to combine. Suppose you have multiple crates like metadeps, which parse all their information from Cargo.toml, need no inputs, and provide outputs directly to Cargo; they only need to provide metadata on success, or an error message otherwise. What if you listed those crates in a buildext or similar key in Cargo.toml, and Cargo automatically built them as a dependency, invoked them by a standard interface at the time that it would invoke build.rs?
That would allow using the crate ecosystem to standardize key portions of build systems into declarative metadata, without privileging any particular implementation or limiting additions. And crates wouldn't need a programmatic build.rs file unless they wanted to do something not supported by such build-extension crates.
So, for instance, a project written in Rust, to bind to a C library, using bindgen at build time, and expose a Python interface could have buildext = ["metadeps", "metabind", "pyinterface"], and if that covers all your requirements, you wouldn't need build.rs at all. Cargo would build all three of those crates, invoke them, incorporate the additional metadata they emit, and then build the crate at hand.
How does that sound?
I've looked at my build.rs scripts, and in order to replace them, I'd need these features:
-
Ability to configure which dependencies (including indirect dependencies-of-dependencies) are linked statically or dynamically, per project, per OS.
- For example on macOS I have to link to libpng statically, but I should link to zlib dynamically. On Windows everything static. On Linux it's distro-dependent.
-
Run bindgen for libraries that have version-dependent ABI. For example using system-wide libvpx requires using the same
.hversion as installed on the system.- but compiling bindgen is painfully slow, so I bundle several versions of pre-compiled
ffi.rsand use a build script to pick one or fall back to bindgen only when I don't have the right version already.
- but compiling bindgen is painfully slow, so I bundle several versions of pre-compiled
-
Windows-compatible
pkg-configalternative. On macOS and Linux thepkg-configcrate works for 80% of cases, but on Windows packaging is a hopeless mess and I build from source instead :( -
Compatibility with
cmakeandautotoolsto build a complex library that is not just in C, but also builds architecture-dependent assembly files.
This issue came up in discussions the Firefox build team had with @alexcrichton last week. One thing that "build scripts as black boxes" makes difficult is caching build outputs with sccache. Currently we can cache the outputs of rustc invocations, but we still have to run all the build scripts, and some of them spend a lot of time doing things like invoking third-party build systems. If we had a more declarative syntax we could cache the output of build scripts and avoid unnecessary work. For this to work we'd need a full list of inputs and outputs for the build script up-front. I could imagine this being a little tricky for build scripts that are doing things like "build a project using the cmake crate to invoke its cmake build system" (servo-freetype-sys is one example I know of that does this.)
~~Another problem is that for build scripts (and Make, etc.) environmental variables are "dependencies" too.~~
edit: there's rerun-if-env-changed
For this to work we'd need a full list of inputs and outputs for the build script up-front
Even knowing the list of inputs & outputs after a single invocation of the build script (ie: not up-front) should allow a bit of improvement here as one would be able to avoid running the build script again.
Another alternate is allowing a way to ask a build script to "tell me what you need, but avoid doing any work" (though that may not be feasible for a build script to provide).
There's an RFC for build system integration now up at https://github.com/rust-lang/rfcs/pull/2136
Exposing pkg-config declarative interface in Cargo.toml would go a long way, moving build time dependencies from build.rs into static configuration that can be parsed by other package managers like Nix.
While it's true that pkg-config lacks Windows support, it would reflect reality for crates that were never tested nor used on Windows.
In https://github.com/rust-lang/cargo/issues/14903#issuecomment-2523842483, I propose a way for having "declarative build scripts" where you depend on a binary from another package for your build script and specify parameters for it. This would provide a way to experiment with pkg-config or another mechanism before being directly included in Cargo. The first phase is implemented, multiple build scripts. After that is build script parameters, artifact dependencies, and then build script delegation.
In #14903 (comment), I propose a way for having "declarative build scripts" where you depend on a binary from another package for your build script and specify parameters for it. This would provide a way to experiment with
pkg-configor another mechanism before being directly included in Cargo. The first phase is implemented, multiple build scripts. After that is build script parameters, artifact dependencies, and then build script delegation.
I see this as orthogonal, it's great to have more flexible build.rs support, but I'm talking about declaratively specifying system dependencies in Cargo.toml without any custom Rust code at build time. I'm happy to code this up if it would be considered.
If cargo is not using this, why not put it in package.metadata?
package.metadata might be fine, but I think a lot of the motivation here was the original liked roadmap item "Rust should integrate easily into large build systems"
Basically, there are tools that want to consume all the cargo project metadata in order to either redo the build in another system, or at least prepare a development environment in which all system libraries that are need for any crate in the project are provided.
For the sake of those tasks, even if Cargo itself doesn't consume the pkg-config information, it is crucial that every build.rs that is doing native deps uses package.metadata or similar in the same way --- if there are N different ways of doing declarative external dependencies using declarative build scripts, we've made some progress, but not enough for the tools above to reliably get all system dependencies.
It would therefore be very nice if Cargo could do the coordination of fostering a single standard for how to declare pkg-config deps, so we have a single way of doing things, and any multiple competing build descripts is just about consuming that single declarative standard in different ways.
Without us using this, the data will be untested and effectively wrong and require back and forth as the people wanting to consume it to get it right. It can then go stale and become bad again.
Also, once it is supported in cargo, it is locked in. If people can prototype using Cargo extension points, we can learn lessons and iterate before locking in a design.
True, but I'd say it's most important to focus on what value proposition to promote to encourage projects to switch from build.rs and/or experience that they're already using to declarative external dependencies. Otherwise, it runs the risk of becoming an xkcd 927 situation.
I've sketched out the implementation in https://github.com/rust-lang/cargo/pull/16281
@domenkozar please note that the best way forward for this is to prototype outside of Cargo. As noted in the labels, this will then likely need an RFC.
btw the most popular crate for this sake right now is https://crates.io/crates/system-deps
And there is an issue about increasing the adoption rate of it so we can learn from there https://github.com/gdesmott/system-deps/issues/97
This issue is very old. In the early days of Rust it used to be enough to just call make and leave system dependencies just as tedious and annoying as they are in C projects (Cargo even supported bash scripts in build directly, without building a Rust binary)
I think the bar for sys crates has been raised since then. Now users expect cargo build to work on Windows and macOS too (out of the box, without having to debug build errors and manually install and configure non-Cargo dependencies).
Mature sys crates tend to have fallbacks for non-Linux-distro OSes, e.g. build from vendored sources, try Homebrew (taking into account its quirks), search some hardcoded paths where Windows installers put libraries, or at minimum support env vars for paths to pre-built lib and include dirs. Good sys crates even automatically disable pkg-config to stop it from breaking cross-compilation.
Even sys crates that don't support the OSes without package management tend to have some library-specific hacks, such as trying different .pc names for different distros that packaged it their own way using their naming scheme (foo vs libfoo vs vendorfoo vs foo10, plus workarounds for pkg-config having compiler-specific flags or not supporting static linking).
Even a sys crate starts out as just a minimal wrapper around a plain pkg_config call (or cmake or meson), when it gets enough traction to start breaking peoples' builds, ends up having these fallbacks and workarounds contributed.
I'm very skeptical of any declarative methods that assume OS packages are a thing, and effectively close the path to supporting anything other than the most trivial case of non-cross builds on unixlike OSes with a first-class native package manager that also packages a substantial number of 3rd party libraries.
Remember that pkg-config works tolerably only where distros put a ton of work to make thousands of snowflake C libraries work coherently together. Makers of Windows, macOS, iOS, etc. don't do that. These OSes don't have native package management, at least not in the form that pkg-config expects. They have some optional package managers available (3rd party ones and/or limited to a project/IDE), but those are incomplete and not well integrated into the OS compared to an average Linux distro.
OSes not built around package management are not able to track dependencies or install them in an automated way for an average user. Their bolted-on pkg-config transplants usually make non-redistributable binaries. This makes building binaries that work for other people using Windows/macOS/iOS extra painful, because not only the pkg-config-based builds aren't convenient nor reliable, it takes extra effort to prevent builds from trying to use pkg-config.
I'm very skeptical of any declarative methods that assume OS packages are a thing, and effectively close the path to supporting anything other than the most trivial case of non-cross builds on unixlike OSes with a first-class native package manager that also packages a substantial number of 3rd party libraries.
As the person that primarily wrote the cross compilation infrastructure for Nixpkgs, I absolutely care about all the sorts of platforms to build both and for you describe. We always maintain separate PKG_CONFIG_PATHS for build and host, in native and cross builds alike, for example. And we have our eyes set on Windows and non-linux Unix as build and host platforms too.
The good thing about pkg-config names and versions is not even the tool, or most OS package mangers, but it is simple the existing "how to get this package"-agnostic namespace of "special snowflake" C libraries that exists. Declaring the dependency doesn't mean you can't also roll your own build.rs solution too. But I creates structured information that we can do better and better things with over time.