cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Reduce the need for users to write build scripts

Open epage opened this issue 1 year ago • 18 comments

Like RUSTFLAGS, build scripts are an important escape hatch. Like RUSTFLAGS (#12739), we should find replacements for common uses of build scripts so people don't have to reach to this escape hatch so often

Reducing build scripts would

  • Improve build times
  • Reduce risk of bugs
  • Reduce the dependency review audit scope

Uses of build scripts

  • [ ] Version detection
    • Stabilizing rust-lang/rust#64797
    • Stabilizing rust-lang/rust#64796
  • [ ] Making cfg values available at runtime (e.g. in --bugreport, --version, or crash reports, or for tests building examples)
  • [ ] Making TARGET available at runtime (e.g. --version, building examples in tests)
  • [ ] Codegen
  • [ ] cfg_aliases
    • see also https://github.com/rust-lang/cargo/issues/14948#issuecomment-2581426536)
    • see also https://github.com/rust-lang/rfcs/pull/3804
  • [ ] Feature warnings (deprecations, a feature being "disabled" due to a platform or another feature, etc)
    • see also https://github.com/rust-lang/rfcs/pull/3486
  • [ ] Catch-all: rust-lang/cargo#14903
    • Note: this would also allow consolidating the ecosystem which would help with baking-in more features to replace build scripts
    • Enable linker warnings on Windows
    • Embedded windows manifests (#16328)
    • Can we use this for -sys crates / FFI?

epage avatar Dec 17 '24 17:12 epage

I would draw a distinction between main binary build scripts (e.g. the executable or cdylib) and libraries. I think a build scripts for an executable is far more tolerable than for libraries. They're in direct control of the crate author and don't have any other users. Also a single top-level build script is going to have negligible impact on build times for all but the smallest projects.

Which is not to say reducing the need for them is not important too, just that reducing the need for library build scripts will likely have the far bigger impact.

ChrisDenton avatar Dec 26 '24 23:12 ChrisDenton

This is not about trying to remove all build scripts but finding patterns in the community and providing a solution to avoid writing one yourself. In some cases, this will be using an artifact dependency as your build script. This is a general solution that helps libs and bins. And the benefits extend to bins. For smaller projects it democratizes the benefits of build scripts by making it easier to discover what you can do, easier to implement, and with fewer bugs. For larger projects with multiple bins, it makes reuse easier, reduces build times, and makes it easier to audit your own code base.

Yes, libs will get more benefits as we prioritize solutions to implement but understanding and tracking the needs of bins takes little away from that but instead gives a better understanding for designing soluations.

epage avatar Dec 26 '24 23:12 epage

I don't know if this is important enough, but I also have a small use-case: I integrate a version string from an environment variable into my program during compile time. A natural way to do this would be the env!() macro, but this will not rebuild if the env variable changes.

So I am forced to use a build script with the cargo::rerun-if-env-changed declaration.

kamulos avatar Jan 03 '25 10:01 kamulos

env!() should rebuild when the env var changes. We changed rustc to record which env vars are read using env!() into the dep info file years ago and at the same time changed cargo to rebuild crates when env vars mentioned in the dep info file changes.

bjorn3 avatar Jan 03 '25 10:01 bjorn3

Oh that works! 😊 I don't know what went wrong the last time I tested it, sorry for the confusion.

kamulos avatar Jan 03 '25 12:01 kamulos

Cargo might not fully track its own env variables which would be a bug.

iirc try_env! does not report env variables for tracking, at least if they aren't there.

epage avatar Jan 03 '25 12:01 epage

Cargo might not fully track its own env variables which would be a bug.

iirc try_env! does not report env variables for tracking, at least if they aren't there.

Did you mean option_env!? They're also tracked by # env-dep: comment in rustc dep-info files.

weihanglo avatar Jan 03 '25 18:01 weihanglo

Can we add "cfg aliases" to this list (i.e. what is implemented using a build script by the https://docs.rs/cfg_aliases/latest/cfg_aliases/ crate)?

Winit is a prominent crate that depends on this cfg_aliases, and from a Dioxus/Blitz perspective I understand why they've done this. We support 6 platforms (windows/macos/linux/ios/android/web) and try to provide fine(ish)-grained feature flagging for each supported feature to allow users to optimise build times and binary sizes. This leads to cfg expressions that spread to 6 lines or more.

Use cases where this is commonly comes up are where there is functionality that works on N of M platforms (e.g. 3 of 6), that you also want to be controllable with a feature flag (users may or may not want the function), and that you want to be disabled on platforms where it doesn't work even when the feature is enabled (as otherwise it becomes burdensome for users to set platform-specific features (where this is even possible)).

To give an entirely concrete example. In blitz-shell we use muda for implementing system menus on windows/macos/linux(desktop), but this functionality isn't available and doesn't make sense on android/ios/web. We also wish to make it optional on platforms that do support it. So we need a feature flag AND platform-specific cfgs, which ends up being a lot of boilerplate.

It would be nice if we could define aliases (either in lib.rs or Cargo.toml) and then use a simple #[cfg(menu)].

nicoburns avatar Jan 09 '25 23:01 nicoburns

@nicoburns I've added cfg_aliases. An intermediate solution would be to stabilize metadeps and then you could declare cfg_aliases as one of your build scripts, passing parameters to it. This would mean only one build script bin related to cfg_alises would be built across your entire dependency tree or need to be audited.

epage avatar Jan 10 '25 02:01 epage

I feel like the use cases given might have some overlap with global, mutually exclusive features and I wonder if there is a way to fit a higher level construct into that.

epage avatar Jan 10 '25 02:01 epage

Another use case we have is to record the rustc version used to compile with our library that we telemeter, if enabled. Currently I don't see anything obvious recorded in ELF or PE/COFF executables that record the rustc version used - perhaps it's in the debug info, but that may not have been deployed - but perhaps making some basic info available in header or section data - perhaps the debug section - could be exposed at runtime. I admit I don't know much about ELF - Windows had been my primary dev env for decades - but looking at a release-build from rustc, it does look like GCC is recording quite a bit of build/version information so maybe it's warranted to add some rustc info? MSVC (and apparently clang on Windows now) also stores some linker info in the PE/COFF as well.

I mean, there's other ways as well like linking in function to get info, but stamping this in the executable might be useful to external tools as well. Just some thoughts.

heaths avatar Jan 21 '25 19:01 heaths

Currently I don't see anything obvious recorded in ELF or PE/COFF executables that record the rustc version used

The .comment section for ELF executables contains the version of rustc as well as many other tools that were involved in producing the executable. For example:

$ readelf --string-dump=.comment ~/.cargo/bin/rustup

String dump of section '.comment':
  [     0]  GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-44)
  [    2d]  GCC: (GNU) 9.5.0
  [    3e]  rustc version 1.77.2 (25ef9e3d8 2024-04-09)
  [    6a]  clang version 18.1.0rc

As for PE/COFF executables, it looks like the PDB debuginfo file will contain the rustc string. At least https://rust.godbolt.org/z/3zxrKehsK shows the clang LLVM (rustc version 1.84.0 (9fc6b4312 2025-01-07)) string ending up in the .debug$S section of the object file.

bjorn3 avatar Jan 21 '25 20:01 bjorn3

Interesting. I ran readelf -a <path> and it didn't dump the .comment section, but doing what you did explicitly worked. Maybe std could expose that if there's enough demand.

As for PE/COFF, I expected this was in the .pdb. somewhere but it's rare symbols are deployed. What might be ideal is to actually store it in a debug directory like MSVC does, something to the effect of:

$ dumpbin /headers /section:.rdata <exe>

# ...
  Debug Directories

        Time Type        Size      RVA  Pointer
    -------- ------- -------- -------- --------
    678FF0B6 cv            4E 00003960     2960    Format: RSDS, {0466A304-9921-46F8-8636-239DB83AA58C}, 1, <path>.pdb
    678FF0B6 feat          14 000039B0     29B0    Counts: Pre-VC++ 11.00=0, C/C++=35, /GS=35, /sdl=1, guardN=34
    678FF0B6 coffgrp      30C 000039C4     29C4    4C544347 (LTCG)
    678FF0B6 iltcg          0 00000000        0

heaths avatar Jan 21 '25 21:01 heaths

Another important use-case I want to mention is FFI (especially with C/C++), which probably broadly fits under the codegen category.

build scripts are encouraged for bindgen, CXX and CXX-Qt. Where CXX and CXX-Qt both rely on the cc crate, which also encourages use in build scripts.

Especially in the case of cc/CXX/CXX-Qt, this means the build script invokes the C/C++ compiler. This can be quite detrimental to compile time, as the ever-changing out dir requires a total rebuild of all C++ code even on simple changes. Incremental builds are currently impossible for the C++ code.

Here are the timing results of rebuilding CXX-Qt after touching a single file in cxx-qt-gen: cargo-timing.zip

In this example, the build scripts take up about 17.34s out of 20.5s (~85%).

LeonMatthesKDAB avatar Feb 11 '25 08:02 LeonMatthesKDAB

@LeonMatthesKDAB FFI was listed under #14903 in the list. While maybe we can eventually find a stable interface to declare these (though I suspect we won't), sharing more of the implementation of -sys crates would help reduce bugs and reduce the number of host binaries that need to be built.

As for problems with an ever-changing out dir, I'm not seeing an existing issue and it might be good to open one for us to explore what could be done. Granted, if we could find a way to have C/C++ be a part of our build graph and know how to handle intermediate files, that would be amazing. However, that is probably even farther in the future and builds on the work of #14903

epage avatar Feb 11 '25 14:02 epage

@epage Regarding the out_dir issue, I think the issue that comes closest is #7197 .

This was closed years ago with the note that build scripts should take care of incremental builds themselves. However, I think there still no concrete guideline for how to actually do that at this point?

The best practice for build scripts is that they should only write to their OUT_DIR. However, new invocations of the build script (to my knowledge) have no way of accessing previous OUT_DIRs, which makes doing anything incrementally difficult. It is also unclear when exactly artifacts could be reused from previous invocations, but that's likely an issue that's specific to each build script.

Should I create a new issue for this or re-open #7197 ?

With CXX-Qt concretely, we have vague plans to try our hands on this by combining the scratch directory with hashing of the environment and using sccache to cache the actual C++ compilation. The largest issue with something like sccache at the moment is that sccache will include the full path of the file in its hash, which means sccache does not provide any benefit to files in the OUT_DIR. This wouldn't need any changes in Cargo, but is also a pretty experimental solution to the problem.

LeonMatthesKDAB avatar Feb 12 '25 10:02 LeonMatthesKDAB

Provided that the crate version and build configuration doesn't change, you should be getting the same OUT_DIR even when changing the source code of the crate.

bjorn3 avatar Feb 12 '25 11:02 bjorn3

Provided that the crate version and build configuration doesn't change, you should be getting the same OUT_DIR even when changing the source code of the crate.

@bjorn3 You're right and I'm shocked that I didn't notice that before... At least on my local development machine it works and I get the same OUT_DIR...

I'll have to check what is going on in our CI then, as that gets almost no cache hits. Maybe the hash does end up being different there due to some environment variable set by Github Actions or something like that...

LeonMatthesKDAB avatar Feb 12 '25 12:02 LeonMatthesKDAB