rfcs
rfcs copied to clipboard
New rustc and Cargo options to allow path sanitisation by default
IRLO pre-RFC thread: https://internals.rust-lang.org/t/pre-rfc-cargo-profile-setting-to-sanitise-host-dependent-absolute-paths-enabled-by-default-for-release-builds/14504
Relevant GitHub issue and discussions: https://github.com/rust-lang/rust/issues/40552
This seems reasonable to me, including the step of (eventually) making this default to enabled in release builds. That'll allow people to turn it on in debug if they need to ship debug builds, or conversely, turn it off in release mode if they have a specific need for absolute paths in release builds.
Finally, soon i can remove my 2km long --remap-path-prefix entries from $RUSTFLAGS, neat!
With regard to one of the stated drawbacks:
With trim-path enabled, if the debug option is simultaneously not false (it is turned off by default under release profile), paths in debuginfo will also be remapped. Debuggers will no longer be able to automatically discover and load source files outside of the working directory. This can be remidated by debugger features remapping the path back to a filesystem path.
This seems like a fairly concerning drawback to me. I do a lot of profiling with binaries compiled in release mode, and find it extremely useful that the source code is displayed in the profiling tools. It sounds to me like what this is saying is that workflow will break, and I will have to know to disable trim-path to get it working again. The fix itself isn't too bad. It's mildly annoying, but as long as you know what the fix is, it's probably worth the extra annoyance to gain some added privacy by default. My thinking is that discovering that one needs to disable this option to get debuggers and profiling tools to show you source code might be quite difficult though. There is no obvious point in the process of debugging or profiling that will point you towards trim-path. And in particular, it is later said that
removing absolute paths for all build would be a hassle for debugging
But debugging a release binary is not that uncommon in my experience. So it sounds to me like we are making that experience worse by default. I'm not sure that's the right trade off to make, although it is somewhat murky.
Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files?
I distribute executables, but keep dSYM bundles private. This way I can still symbolicate and debug crashes, but I'm not shipping bloated executables or exposing symbol information in the executable itself.
Trimming of paths in the executables, like paths from panic!/file! would be great — it'd improve privacy and reduce bloat. And keeping full absolute paths in dSYM would keep maximal compatibility with debuggers, profilers, etc.
I know some people use --release flag during development just because the debug code is too slow to run, or it doesn't make sense to profile unoptimized code. But this could be fixed by either educating users about opt-level, and/or maybe adding a convenience --debug-opt flag.
@kornelski
Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files?
That seems like a really good plan. We could start by having trim-path available, and then enable trim-path by default on platforms where we have separated debug symbols by default.
People can also still enable it themselves, and we can consider the tradeoffs for enabling it by default in release mode if not generating debug symbols.
Something that came up in today's @rust-lang/cargo meeting: we should set a variable CARGO_TRIM_PATH or similar for build.rs scripts.
Reading over this one idea I had is that this could be a nice opportunity to leverage community norms and such to make "linkifying" backtraces more easy. For example given a path coming out of a Rust backtrace it'd be awesome to be able to generate an HTTP URL to the exact copy of the source code (with a line number) that the code references for some paths. An RFC would be, in theory, a great place to specify that two canonical sources of Rust code have well-defined paths in "remapped debuginfo situations":
- All rustc source code comes from
/rustc/$rustc_sha/.... This is somewhat grandfathered in but we could presumably change rustc at any time too if we want. - Crates.io-sourced code could be
/cratesio/$crate/$version/...(or something like that).
I think it would be a lot more difficult to automatically translate paths to URLs for other sources of code (e.g. general git repositories, path dependencies, etc), but if we could specify what happens to rustc and crates.io-sourced code, that'd be nice! We could then leave a different default remapping source for all other crates that don't fall in the crates.io category, such as /src/$crate/$version/... or something like that.
Also, when reading this RFC, I think that we'll also want --remap-path-prefix for OUT_DIR-included source code. This means that for packages that have a build script, and we could set the path to something like /out-dir/$crate/$version by default (or something similar).
@alexcrichton One thing I'm slightly annoyed about a path-like prefix (which was proposed in the pre-RFC) is the differences between path separators on Windows vs *nix. That's purely aesthetic and we could format it differently depending on the host, or we could simply not bother and say /cratesio/$crate/$version/src\foo.rs looks fine ~~(or take the average and use |cratesio|$crate|$version but that's an abomination)~~
Regarding linking sysroot and crates.io sources to a URL, I assume that's something the backtrace library can do at runtime?
I'm not quite sure how OUT_DIR works. Is the compiled build script placed under it to be run?
My general idea is that any project which wants to linkify backtraces in Rust code can automatically linkify Rust standard library source code and dependencies coming from crates.io since those are hosted at canonical locations. Cargo is uniquely positioned with a flag like this to set precedent for how everyone should do this, so everyone canonicalizes around the same way that filenames look.
For issues like path separators, we'd canonicalize on one style. For windows/unix, we'd still just canonicalize on one style (the files won't exist locally anyway). Some of this may require rustc support rather than "purely just a flag from Cargo", but my point is more general in that this is an opportunity ripe for the picking in an RFC like this, I think.
Note that I'm not expecting libraries like backtrace to like download sources at crash time, but rather that offline systems which symbolize crashes with Rust stack frames will better be able to link to code which is located in crates.io or Rust libstd dependencies.
For OUT_DIR, it's an environment variable set during crate compilation per-crate to a directory buried within target/.... This is only set when the crate has a build script, and the assumption is that a build script can place rust source code in OUT_DIR (set during the build script's execution) which can then be include!'d as a form of code generation of Rust code.
Coming back to this after a month of busy uni stuff -
GCC and Clang both have three separate path remapping flags:
-fdebug-prefix-mapaffects only debuginfo,-fmacro-prefix-mapaffects only__FILE__-like macro expansions,-ffile-prefix-mapis the combination of both of above. This is what rustc's--remap-path-prefixdoes today.
Seeing the feedbacks to this RFC, I think people want granular treatments to the two different types of paths (debuginfo vs macro expansion) similar to what GCC and Clang has been doing for a while.
If rustc were to have both --remap-debuginfo-prefix and --remap-macro-prefix, trim-paths would be able to provide three-levels of behaviour
- off: No remapping
- macro expansions only: Remap only
file!()macro - full: Remap
file!()macro and debuginfo paths. Do what--remap-path-prefixcurrently does.
And the defaults under release profile would depend on the setting of split-debuginfo (whether split-debuginfo is explicitly set or implicitly set by platform-dependent defaults):
off: Fulltrim-pathsbecause debuginfo would be embedded in the executablepacked("all the debug information is packed into a separate file from the main executable"): Macro expansions-onlytrim-pathsunpacked: Fulltrim-pathson macOS but macro expansions-only on other Unix platforms (this option is not supported on Windows)
This way only the paths that will be contained in the binary will be affected, leaving debuginfo potentially untouched unless they are also embedded in the binary.
I think this is nice and neat and should be the end product we are looking for. But --remap-debuginfo-prefix and --remap-macro-prefix don't exist yet. So the remaining question I have is whether there is a way to implement an MVP for trim-paths without making major changes to rustc flags that will be compatible with the shiny future. Or maybe this is not worth it and we should just introduce the two flags first.
cant tell from thread.
will be explicit opt out (always trim in --release) or opt in (always need flag)? seems most add trim-path in go anyway so maybe make sense to be default in --release and not need extra flag always. except if paths are want for some reason
From the RFC:
This is enabled by default when you do a release build, such as via cargo build --release.
So trimming would be opt-out.
From the RFC:
This is enabled by default when you do a release build, such as via cargo build --release.
So trimming would be opt-out.
ok so only need flag if you want to keep paths, no change if you want them out always?
In release mode, yes.
is privacy issue here still being work or coming soon?
Further proposal has been added the RFC is ready for more reviews. Could someone add T-compiler label?
@cbeuw
If rustc were to have both --remap-debuginfo-prefix and --remap-macro-prefix, trim-paths would be able to provide three-levels of behaviour
- off: No remapping
- macro expansions only: Remap only file!() macro
- full: Remap file!() macro and debuginfo paths. Do what --remap-path-prefix currently does.
This came up in the context of the discussion around https://github.com/rust-lang/rust/issues/87745 but this should also mention "path remapping as applied to emitted diagnostics". That's functionality that gcc/clang currently lacks, which is a pain for certain build systems - eg linking the declared sources into a "sandbox" dir to detect use of undeclared sources, while still presenting the original source path (see Buck).
(Edit: I see that the current draft RFC does explicitly talk about diagnostics.)
This is moot for Cargo, at least as it currently works.
One other comment: I think I'd like this to be more explicitly split into "changes to rustc" and "cargo user-facing changes + how they're build on the rustc changes". These concerns are mostly separate in the RFC as it stands, but they're still quite interleaved. My personal interest is in the rustc changes in isolation, in the context of non-cargo build systems.
Rendered view of most recent version of this PR: rendered
(To help avoid spending time suggesting fixes already made. :) )
i know its holiday time for some places but any updates on this going forward and implemented ?
Sorry it took me a while to get back to this.
Linkers spitting out paths to separate debuginfo file turns out to be a bigger rabbithole than I thought. ~~So far I haven't found /PDBALTPATH's equivalent in ld and lld that lets you change the embedded path to .dwo or rcgu.o files. But I don't think we need to figure it out for each linker to merge this RFC, it'll be a best-effort thing to try to support as many linker as we could.~~
~~Even if there turns out to be no way to change the path to separate debug information files with ld and lld,~~ the default use cases should still be fine. On Unix, fully embedded debug information are correctly remapped. On MSVC the default (separate .pdb file) for MSVC linker can be solved (more consideration needed for the cargo run senario but https://github.com/rust-lang/rust/issues/87825#issuecomment-944821843 should work). On macOS the default (separate .dSYM) is generated by dsymutil, which accepts --object-prefix-map options
Edit: I thought the inclusion of path to .dwo files are done by the linker. Turns out it's acutally rustc, so this isn't an issue anymore since we have full control
I have added another scope split-debuginfo-path, which takes effect only when split-debuginfo is packed or unpacked. In these cases the path to the split debug information files are embedded into the binary, whose sanitisation can be controlled using this new scope.
with all changes adding complexity will default still be full path removals in release mode? so many option is become quite confusing to not compiler dev i think and maybe will cause wrong option to be used leaking data
Yes, the default behaviour is still full path removal in release profile. There are quite a few sources of absolute paths hence the complexity, but the default is sane and complex things are still possible.
is this still in progress? no updates for awhile and been a couple rust releases since. @cbeuw
I'm not sure what else there is to add. There are a couple of unresolved points which I have noted in the text, but people seem fairly happy with the main design. I'll nag the subteams on Zulip to hopefully get another round of review and FCP.
A summary of recent changes:
- Instead of letting Cargo decide whether or not to sanitise debuginfo based on
-C split-debuginfosetting (which necessitated Cargo feeding different arguments torustcdepending on that codegen option), explicit scopesunsplit-debuginfoandsplit-debuginfoare now created to target only embedded and split debuginfo respectively.unsplit-debuginfodoes nothing when the debuginfo is split, andsplit-debuginfodoes nothing when the debuginfo is unsplit. - Instead of numbers, the value of
trim-pathsis now descriptive words (or list of words) that is passed through directly to--remap-path-scopes. This is achieved by extending--remap-path-scopeto take in aliases such asobject=macro,unsplit-debuginfo,split-debuginfo-file(default for release profiles). Note that this does not maketrim-pathsa simple pass through torustclike most other profile options, because Cargo needs to generate and emit the actual path mappings via--remap-path-prefixwhentrim-pathsis notnone.
Since most of the implementation complexities will be in rustc and not Cargo, I really want to hear what @rust-lang/compiler thinks about this (I know you guys are busy 😅)
Haven't had any further comments for a while. @joshtriplett could you tag the compiler team (I'm not in rust-lang org so I can't do it) so someone could have a look? I assume eventually they need to sign off since most changes are in the compiler.
I removed the line "We only need to change the behaviour for Test and Build compile modes" since people might want to be able to remap diagnostics in cargo check or cargo bench. trim-paths being effective to all compile modes is probably more straightforward