rust icon indicating copy to clipboard operation
rust copied to clipboard

[RFC] Support `.comment` section like GCC/Clang (`!llvm.ident`)

Open ojeda opened this issue 2 years ago • 20 comments

Both GCC and Clang write by default a .comment section with compiler information:

$ gcc -c -xc /dev/null && readelf -p '.comment' null.o

String dump of section '.comment':
  [     1]  GCC: (GNU) 11.2.0

$ clang -c -xc /dev/null && readelf -p '.comment' null.o

String dump of section '.comment':
  [     1]  clang version 14.0.1 (https://github.com/llvm/llvm-project.git c62053979489ccb002efe411c3af059addcb5d7d)

They also implement the -Qn flag to avoid doing so:

$ gcc -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!

$ clang -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!

So far, rustc only does it for WebAssembly targets and only when debug info is enabled:

$ echo 'fn main(){}' | rustc --target=wasm32-unknown-unknown --emit=llvm-ir -Cdebuginfo=2 - && grep llvm.ident rust_out.ll
!llvm.ident = !{!27}

The RFC part of this PR is about which behavior should rustc follow:

  • Always add it.
  • Add it by default, i.e. have an opt-out flag (GCC, Clang).
  • Have an opt-in flag.
  • Never add it (current).

There is also the question of whether debug info being enabled matters for that decision, given the current behavior of WebAssembly targets.

For instance, adding it by default gets us closer to other popular compilers, but that may surprise some users with an information leak. The most conservative option is to only do so opt-in, even if debug info is enabled (some users may be stripping debug info and not expecting something else to be leaked elsewhere).

Implementation-wise, this covers both ModuleLlvm::new() and ModuleLlvm::new_metadata() cases by moving the addition to context::create_module and adds a few test cases.

ThinLTO also sees the llvm.ident named metadata duplicated (in temporary outputs), so this deduplicates it like it is done for wasm.custom_sections. The tests also check this duplication does not take place.

ojeda avatar May 30 '22 13:05 ojeda

r? @lcnr

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar May 30 '22 13:05 rust-highfive

The job x86_64-gnu-llvm-12 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
error: make failed
status: exit status: 2
command: "make"
--- stdout -------------------------------
# `-Ccodegen-units=16 -Copt-level=2` is used here to trigger thin LTO
# across codegen units to test deduplication of the named metadata
# (see `LLVMRustPrepareThinLTOImport` for details).
echo 'fn main(){}' | LD_LIBRARY_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident:/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/lib" '/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident -L /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident  - --emit=link,obj -Csave-temps -Ccodegen-units=16 -Copt-level=2
# `llvm-dis` is used here since `--emit=llvm-ir` does not emit LLVM IR
# for temporary outputs.
"/usr/lib/llvm-12/bin/llvm-dis" /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident/*.bc
--- stderr -------------------------------
--- stderr -------------------------------
warning: ignoring emit path because multiple .o files were produced
warning: 1 warning emitted


llvm-dis: Too many positional arguments specified!
Can specify at most 1 positional arguments: See: /usr/lib/llvm-12/bin/llvm-dis --help
make: *** [Makefile:12: all] Error 1



failures:

rust-log-analyzer avatar May 30 '22 13:05 rust-log-analyzer

I'm surprised this wasn't already added.

For instance, adding it by default gets us closer to other popular compilers, but that may surprise some users with an information leak.

I am pretty sure it is already possible to reliably identify the exact rustc version used soly based on the standard library functions linked into the executable. Without stripping you can trivially lookup the stable crate id from a libcore symbol name (the stable crate id depends among other things on the rustc version) and even without symbols I'm pretty sure there are enough differences between versions to identify which rustc is used. In any case it will be possible to at least identify that rustc is used if you don't restrict yourself to #![no_std] and avoid calling every non-trivial function in libcore and enable LTO.

bjorn3 avatar May 30 '22 14:05 bjorn3

Without stripping

Yeah, I am just trying to look at it from the shoes of users working on proprietary software where it is common to strip it and clean/obfuscate/tweak binaries in further ways. Some of them may suddenly realize their executable is now explicitly marked and come to complain about it (even if they should be checking for extraneous sections anyway).

even without symbols

Definitely, I am sure it is possible to find fancier ways. I wouldn't be surprised if NNs are being applied to the problem already. :)

ojeda avatar May 30 '22 15:05 ojeda

r? @bjorn3

Considering that this is annotated as [RFC], maybe you should open a PR to https://github.com/rust-lang/rfcs/ instead? Don't know how complex of a decision this is, so maybe a compiler team MCP is enough?

lcnr avatar May 30 '22 15:05 lcnr

(@bjorn3 To be clear, I wouldn't mind doing it by default and getting closer to GCC and Clang; I am just playing devil's advocate)

ojeda avatar May 30 '22 15:05 ojeda

Considering that this is annotated as [RFC], maybe you should open a PR to https://github.com/rust-lang/rfcs/ instead? Don't know how complex of a decision this is, so maybe a compiler team MCP is enough?

Definitely, if it is deemed necessary, I can write one. I wasn't sure, so I tagged it here for the moment. Thanks!

ojeda avatar May 30 '22 15:05 ojeda

FWIW, rustc is already setting the producer string and adding it to debuginfo: https://github.com/rust-lang/rust/blob/0f06824013761ed6829887019033f1001e68f623/compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs#L813-L816

This is what you get when you run strings on rustc output that has debuginfo:

$ echo 'fn main(){}' | rustc -Cdebuginfo=2 - && strings rust_out | rg "rustc version"
clang LLVM (rustc version 1.62.0-beta.2 (daf68b1f7 2022-05-19))

est31 avatar May 30 '22 15:05 est31

This is what you get when you run strings on rustc output that has debuginfo:

Yeah, that is DW_AT_producer; however sometimes the debug info may not be available (this came up as a possibility to know whether an object file came from C or Rust sources; with DWARF one may directly look into DW_AT_language instead for this purpose). It is also good to be consistent with the other compilers even if debug info is available in case some tooling looks for .comment.

ojeda avatar May 30 '22 16:05 ojeda

Maybe I missed the motivation for doing this:

  • consistency with clang and gcc
  • you have a use case: I need it for the linux kernel?
  • something else

tschuett avatar May 31 '22 18:05 tschuett

It is quite useful for easily figuring out which rustc version an executable was built with. This could for example be used to determine that it needs to be rebuilt if the rustc version it was built with has a CVE, but there are many more reasons to need which compiler version an executable was built with.

bjorn3 avatar May 31 '22 19:05 bjorn3

That does indeed sound useful.

tschuett avatar May 31 '22 19:05 tschuett

  • consistency with clang and gcc
  • you have a use case: I need it for the linux kernel?

To clarify a bit on those two you mention:

  • Tooling (and/or users) may expect or take advantage of the .comment section (e.g. it would be one less difference between C and Rust object files in the kernel, there is the security monitoring point from @bjorn3, etc.).

    Thus being consistent with GCC and Clang/LLVM is likely a good idea for those reasons (i.e. being consistent has other benefits; it is not just for the sake of consistency).

  • The kernel could use it (in principle) to distinguish C vs. Rust kernel modules when there is no debug info around. One can achieve the same in other ways, though.

    I do not want to say "we need it", because we might end up not using it, but it would be nice to have for the other reasons.

Another point of view (not exactly a motivation, but related) is that one could argue it should be left up to the backend, e.g. if LLVM usually does it (when given the information), then it sounds reasonable to let it do it. rustc could still provide a way for users to opt-out that backends should respect. Maybe target specs could carry that info too. (cc @antoyo for GCC).

ojeda avatar May 31 '22 20:05 ojeda

Sorry. I believe consistency is it should be named .comment and not idk .version.

The real question is: is it helpful for rust users? I believe so.

tschuett avatar Jun 01 '22 20:06 tschuett

Given that this is a public-facing change, a FCP is probably more appropriate?

nbdd0121 avatar Jun 04 '22 20:06 nbdd0121

I don't have permission to start an fcp. Also should this be a compiler or lang team fcp?

bjorn3 avatar Jun 14 '22 14:06 bjorn3

:umbrella: The latest upstream changes (presumably #96285) made this pull request unmergeable. Please resolve the merge conflicts.

bors avatar Jun 15 '22 00:06 bors

Discussed in T-compiler meeting.

@rfcbot fcp merge

pnkfelix avatar Jul 21 '22 14:07 pnkfelix

Team member @pnkfelix has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Aaron1011
  • [x] @cjgillot
  • [x] @davidtwco
  • [x] @eddyb
  • [x] @estebank
  • [x] @lcnr
  • [x] @matthewjasper
  • [x] @michaelwoerister
  • [x] @nagisa
  • [ ] @nikomatsakis
  • [x] @oli-obk
  • [x] @petrochenkov
  • [x] @pnkfelix
  • [x] @wesleywiser

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Jul 21 '22 14:07 rfcbot

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot avatar Aug 05 '22 15:08 rfcbot

Yeah, I am just trying to look at it from the shoes of users working on proprietary software where it is common to strip it and clean/obfuscate/tweak binaries in further ways.

Hey, that's me! I suppose rust won't have an option to toggle this, and we'll have to use objcopy to remove the section?

roblabla avatar Aug 11 '22 10:08 roblabla

You can only use strip --remove-section=.comment I think. You have to do this anyway to remove info about the linker that was used and which C compiler was used.

bjorn3 avatar Aug 11 '22 10:08 bjorn3

you started a FCP, but it is not obvious which option you are proposing should be selected:

The RFC part of this PR is about which behavior should rustc follow:

  • Always add it.
  • Add it by default, i.e. have an opt-out flag (GCC, Clang).
  • Have an opt-in flag.
  • Never add it (current).

programmerjake avatar Aug 12 '22 09:08 programmerjake

you started a FCP, but it is not obvious which option you are proposing should be selected:

The RFC part of this PR is about which behavior should rustc follow:

  • Always add it.
  • Add it by default, i.e. have an opt-out flag (GCC, Clang).
  • Have an opt-in flag.
  • Never add it (current).

I didn't start the FCP, but personally I would go for one of the first two for consistency with GCC and Clang as long as the release notes mention the new section being added by default so that users are aware of it.

The PR implements the first, which is the simplest -- if that one goes in, the opt-out flag can always be added later if enough users claim the need for it.

ojeda avatar Aug 12 '22 14:08 ojeda

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

rfcbot avatar Aug 15 '22 15:08 rfcbot

For future reference: rustc has always embedded its version in the .rustc section. If you need to determine the compiler version for older binaries (built before this RP shipped), use the .rustc section instead.

This is the code responsible for embedding it:

https://github.com/rust-lang/rust/blob/2e35f954ada0f0c777844dc4fa66684efe90a035/compiler/rustc_codegen_ssa/src/back/metadata.rs#L262-L314

Shnatsel avatar Sep 01 '22 22:09 Shnatsel

For future reference: rustc has always embedded its version in the .rustc section.

This section only exists in rust dylibs and proc macros. It does not exist in executables and cdylibs.

bjorn3 avatar Sep 02 '22 06:09 bjorn3

I see. The fallback for older versions would be parsing debuginfo, then, as described here.

Shnatsel avatar Sep 02 '22 09:09 Shnatsel

Addressed @bjorn3's comment and moved use libc::c_uint; to nearby std:: ones, like in metadata.rs which the commit touches (it isn't entirely consistent across all files, but seemed better).

ojeda avatar Sep 10 '22 18:09 ojeda

@bors r+

bjorn3 avatar Sep 10 '22 19:09 bjorn3