rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

New rustc and Cargo options to allow path sanitisation by default

Open cbeuw opened this issue 4 years ago • 47 comments

IRLO pre-RFC thread: https://internals.rust-lang.org/t/pre-rfc-cargo-profile-setting-to-sanitise-host-dependent-absolute-paths-enabled-by-default-for-release-builds/14504

Relevant GitHub issue and discussions: https://github.com/rust-lang/rust/issues/40552

Rendered

cbeuw avatar May 23 '21 23:05 cbeuw

This seems reasonable to me, including the step of (eventually) making this default to enabled in release builds. That'll allow people to turn it on in debug if they need to ship debug builds, or conversely, turn it off in release mode if they have a specific need for absolute paths in release builds.

joshtriplett avatar May 25 '21 20:05 joshtriplett

Finally, soon i can remove my 2km long --remap-path-prefix entries from $RUSTFLAGS, neat!

Evrey avatar May 28 '21 07:05 Evrey

With regard to one of the stated drawbacks:

With trim-path enabled, if the debug option is simultaneously not false (it is turned off by default under release profile), paths in debuginfo will also be remapped. Debuggers will no longer be able to automatically discover and load source files outside of the working directory. This can be remidated by debugger features remapping the path back to a filesystem path.

This seems like a fairly concerning drawback to me. I do a lot of profiling with binaries compiled in release mode, and find it extremely useful that the source code is displayed in the profiling tools. It sounds to me like what this is saying is that workflow will break, and I will have to know to disable trim-path to get it working again. The fix itself isn't too bad. It's mildly annoying, but as long as you know what the fix is, it's probably worth the extra annoyance to gain some added privacy by default. My thinking is that discovering that one needs to disable this option to get debuggers and profiling tools to show you source code might be quite difficult though. There is no obvious point in the process of debugging or profiling that will point you towards trim-path. And in particular, it is later said that

removing absolute paths for all build would be a hassle for debugging

But debugging a release binary is not that uncommon in my experience. So it sounds to me like we are making that experience worse by default. I'm not sure that's the right trade off to make, although it is somewhat murky.

BurntSushi avatar May 28 '21 12:05 BurntSushi

Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files?

I distribute executables, but keep dSYM bundles private. This way I can still symbolicate and debug crashes, but I'm not shipping bloated executables or exposing symbol information in the executable itself.

Trimming of paths in the executables, like paths from panic!/file! would be great — it'd improve privacy and reduce bloat. And keeping full absolute paths in dSYM would keep maximal compatibility with debuggers, profilers, etc.

kornelski avatar May 28 '21 15:05 kornelski

I know some people use --release flag during development just because the debug code is too slow to run, or it doesn't make sense to profile unoptimized code. But this could be fixed by either educating users about opt-level, and/or maybe adding a convenience --debug-opt flag.

kornelski avatar May 28 '21 15:05 kornelski

@kornelski

Would it be possible to separate trimming of paths inside executables from trimming of paths in external debug files?

That seems like a really good plan. We could start by having trim-path available, and then enable trim-path by default on platforms where we have separated debug symbols by default.

People can also still enable it themselves, and we can consider the tradeoffs for enabling it by default in release mode if not generating debug symbols.

joshtriplett avatar May 28 '21 17:05 joshtriplett

Something that came up in today's @rust-lang/cargo meeting: we should set a variable CARGO_TRIM_PATH or similar for build.rs scripts.

joshtriplett avatar Jun 01 '21 19:06 joshtriplett

Reading over this one idea I had is that this could be a nice opportunity to leverage community norms and such to make "linkifying" backtraces more easy. For example given a path coming out of a Rust backtrace it'd be awesome to be able to generate an HTTP URL to the exact copy of the source code (with a line number) that the code references for some paths. An RFC would be, in theory, a great place to specify that two canonical sources of Rust code have well-defined paths in "remapped debuginfo situations":

  • All rustc source code comes from /rustc/$rustc_sha/.... This is somewhat grandfathered in but we could presumably change rustc at any time too if we want.
  • Crates.io-sourced code could be /cratesio/$crate/$version/... (or something like that).

I think it would be a lot more difficult to automatically translate paths to URLs for other sources of code (e.g. general git repositories, path dependencies, etc), but if we could specify what happens to rustc and crates.io-sourced code, that'd be nice! We could then leave a different default remapping source for all other crates that don't fall in the crates.io category, such as /src/$crate/$version/... or something like that.

Also, when reading this RFC, I think that we'll also want --remap-path-prefix for OUT_DIR-included source code. This means that for packages that have a build script, and we could set the path to something like /out-dir/$crate/$version by default (or something similar).

alexcrichton avatar Jun 01 '21 19:06 alexcrichton

@alexcrichton One thing I'm slightly annoyed about a path-like prefix (which was proposed in the pre-RFC) is the differences between path separators on Windows vs *nix. That's purely aesthetic and we could format it differently depending on the host, or we could simply not bother and say /cratesio/$crate/$version/src\foo.rs looks fine ~~(or take the average and use |cratesio|$crate|$version but that's an abomination)~~

Regarding linking sysroot and crates.io sources to a URL, I assume that's something the backtrace library can do at runtime?

I'm not quite sure how OUT_DIR works. Is the compiled build script placed under it to be run?

cbeuw avatar Jun 01 '21 20:06 cbeuw

My general idea is that any project which wants to linkify backtraces in Rust code can automatically linkify Rust standard library source code and dependencies coming from crates.io since those are hosted at canonical locations. Cargo is uniquely positioned with a flag like this to set precedent for how everyone should do this, so everyone canonicalizes around the same way that filenames look.

For issues like path separators, we'd canonicalize on one style. For windows/unix, we'd still just canonicalize on one style (the files won't exist locally anyway). Some of this may require rustc support rather than "purely just a flag from Cargo", but my point is more general in that this is an opportunity ripe for the picking in an RFC like this, I think.

Note that I'm not expecting libraries like backtrace to like download sources at crash time, but rather that offline systems which symbolize crashes with Rust stack frames will better be able to link to code which is located in crates.io or Rust libstd dependencies.

For OUT_DIR, it's an environment variable set during crate compilation per-crate to a directory buried within target/.... This is only set when the crate has a build script, and the assumption is that a build script can place rust source code in OUT_DIR (set during the build script's execution) which can then be include!'d as a form of code generation of Rust code.

alexcrichton avatar Jun 01 '21 21:06 alexcrichton

Coming back to this after a month of busy uni stuff -

GCC and Clang both have three separate path remapping flags:

  • -fdebug-prefix-map affects only debuginfo,
  • -fmacro-prefix-map affects only __FILE__-like macro expansions,
  • -ffile-prefix-map is the combination of both of above. This is what rustc's --remap-path-prefix does today.

Seeing the feedbacks to this RFC, I think people want granular treatments to the two different types of paths (debuginfo vs macro expansion) similar to what GCC and Clang has been doing for a while.

If rustc were to have both --remap-debuginfo-prefix and --remap-macro-prefix, trim-paths would be able to provide three-levels of behaviour

  • off: No remapping
  • macro expansions only: Remap only file!() macro
  • full: Remap file!() macro and debuginfo paths. Do what --remap-path-prefix currently does.

And the defaults under release profile would depend on the setting of split-debuginfo (whether split-debuginfo is explicitly set or implicitly set by platform-dependent defaults):

  • off: Full trim-paths because debuginfo would be embedded in the executable
  • packed ("all the debug information is packed into a separate file from the main executable"): Macro expansions-only trim-paths
  • unpacked: Full trim-paths on macOS but macro expansions-only on other Unix platforms (this option is not supported on Windows)

This way only the paths that will be contained in the binary will be affected, leaving debuginfo potentially untouched unless they are also embedded in the binary.

I think this is nice and neat and should be the end product we are looking for. But --remap-debuginfo-prefix and --remap-macro-prefix don't exist yet. So the remaining question I have is whether there is a way to implement an MVP for trim-paths without making major changes to rustc flags that will be compatible with the shiny future. Or maybe this is not worth it and we should just introduce the two flags first.

cbeuw avatar Jun 27 '21 17:06 cbeuw

cant tell from thread.

will be explicit opt out (always trim in --release) or opt in (always need flag)? seems most add trim-path in go anyway so maybe make sense to be default in --release and not need extra flag always. except if paths are want for some reason

ghost avatar Jul 09 '21 12:07 ghost

From the RFC:

This is enabled by default when you do a release build, such as via cargo build --release.

So trimming would be opt-out.

the8472 avatar Jul 09 '21 12:07 the8472

From the RFC:

This is enabled by default when you do a release build, such as via cargo build --release.

So trimming would be opt-out.

ok so only need flag if you want to keep paths, no change if you want them out always?

ghost avatar Jul 09 '21 13:07 ghost

In release mode, yes.

the8472 avatar Jul 09 '21 13:07 the8472

is privacy issue here still being work or coming soon?

ghost avatar Aug 22 '21 18:08 ghost

Further proposal has been added the RFC is ready for more reviews. Could someone add T-compiler label?

cbeuw avatar Aug 31 '21 10:08 cbeuw

@cbeuw

If rustc were to have both --remap-debuginfo-prefix and --remap-macro-prefix, trim-paths would be able to provide three-levels of behaviour

  • off: No remapping
  • macro expansions only: Remap only file!() macro
  • full: Remap file!() macro and debuginfo paths. Do what --remap-path-prefix currently does.

This came up in the context of the discussion around https://github.com/rust-lang/rust/issues/87745 but this should also mention "path remapping as applied to emitted diagnostics". That's functionality that gcc/clang currently lacks, which is a pain for certain build systems - eg linking the declared sources into a "sandbox" dir to detect use of undeclared sources, while still presenting the original source path (see Buck).

(Edit: I see that the current draft RFC does explicitly talk about diagnostics.)

This is moot for Cargo, at least as it currently works.

jsgf avatar Oct 02 '21 21:10 jsgf

One other comment: I think I'd like this to be more explicitly split into "changes to rustc" and "cargo user-facing changes + how they're build on the rustc changes". These concerns are mostly separate in the RFC as it stands, but they're still quite interleaved. My personal interest is in the rustc changes in isolation, in the context of non-cargo build systems.

jsgf avatar Oct 02 '21 22:10 jsgf

Rendered view of most recent version of this PR: rendered

(To help avoid spending time suggesting fixes already made. :) )

follower avatar Oct 27 '21 05:10 follower

i know its holiday time for some places but any updates on this going forward and implemented ?

tuesdaycodes avatar Nov 26 '21 18:11 tuesdaycodes

Sorry it took me a while to get back to this.

Linkers spitting out paths to separate debuginfo file turns out to be a bigger rabbithole than I thought. ~~So far I haven't found /PDBALTPATH's equivalent in ld and lld that lets you change the embedded path to .dwo or rcgu.o files. But I don't think we need to figure it out for each linker to merge this RFC, it'll be a best-effort thing to try to support as many linker as we could.~~

~~Even if there turns out to be no way to change the path to separate debug information files with ld and lld,~~ the default use cases should still be fine. On Unix, fully embedded debug information are correctly remapped. On MSVC the default (separate .pdb file) for MSVC linker can be solved (more consideration needed for the cargo run senario but https://github.com/rust-lang/rust/issues/87825#issuecomment-944821843 should work). On macOS the default (separate .dSYM) is generated by dsymutil, which accepts --object-prefix-map options

Edit: I thought the inclusion of path to .dwo files are done by the linker. Turns out it's acutally rustc, so this isn't an issue anymore since we have full control

cbeuw avatar Dec 05 '21 01:12 cbeuw

I have added another scope split-debuginfo-path, which takes effect only when split-debuginfo is packed or unpacked. In these cases the path to the split debug information files are embedded into the binary, whose sanitisation can be controlled using this new scope.

cbeuw avatar Dec 05 '21 23:12 cbeuw

with all changes adding complexity will default still be full path removals in release mode? so many option is become quite confusing to not compiler dev i think and maybe will cause wrong option to be used leaking data

ghost avatar Dec 26 '21 17:12 ghost

Yes, the default behaviour is still full path removal in release profile. There are quite a few sources of absolute paths hence the complexity, but the default is sane and complex things are still possible.

cbeuw avatar Jan 20 '22 13:01 cbeuw

is this still in progress? no updates for awhile and been a couple rust releases since. @cbeuw

ghost avatar Mar 22 '22 12:03 ghost

I'm not sure what else there is to add. There are a couple of unresolved points which I have noted in the text, but people seem fairly happy with the main design. I'll nag the subteams on Zulip to hopefully get another round of review and FCP.

cbeuw avatar Apr 24 '22 12:04 cbeuw

A summary of recent changes:

  1. Instead of letting Cargo decide whether or not to sanitise debuginfo based on -C split-debuginfo setting (which necessitated Cargo feeding different arguments to rustc depending on that codegen option), explicit scopes unsplit-debuginfo and split-debuginfo are now created to target only embedded and split debuginfo respectively. unsplit-debuginfo does nothing when the debuginfo is split, and split-debuginfo does nothing when the debuginfo is unsplit.
  2. Instead of numbers, the value of trim-paths is now descriptive words (or list of words) that is passed through directly to --remap-path-scopes. This is achieved by extending --remap-path-scope to take in aliases such as object = macro,unsplit-debuginfo,split-debuginfo-file (default for release profiles). Note that this does not make trim-paths a simple pass through to rustc like most other profile options, because Cargo needs to generate and emit the actual path mappings via --remap-path-prefix when trim-paths is not none.

Since most of the implementation complexities will be in rustc and not Cargo, I really want to hear what @rust-lang/compiler thinks about this (I know you guys are busy 😅)

cbeuw avatar Apr 28 '22 23:04 cbeuw

Haven't had any further comments for a while. @joshtriplett could you tag the compiler team (I'm not in rust-lang org so I can't do it) so someone could have a look? I assume eventually they need to sign off since most changes are in the compiler.

cbeuw avatar May 20 '22 12:05 cbeuw

I removed the line "We only need to change the behaviour for Test and Build compile modes" since people might want to be able to remap diagnostics in cargo check or cargo bench. trim-paths being effective to all compile modes is probably more straightforward

cbeuw avatar May 26 '22 18:05 cbeuw