rust
rust copied to clipboard
[RFC] Support `.comment` section like GCC/Clang (`!llvm.ident`)
Both GCC and Clang write by default a .comment
section with compiler information:
$ gcc -c -xc /dev/null && readelf -p '.comment' null.o
String dump of section '.comment':
[ 1] GCC: (GNU) 11.2.0
$ clang -c -xc /dev/null && readelf -p '.comment' null.o
String dump of section '.comment':
[ 1] clang version 14.0.1 (https://github.com/llvm/llvm-project.git c62053979489ccb002efe411c3af059addcb5d7d)
They also implement the -Qn
flag to avoid doing so:
$ gcc -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
$ clang -Qn -c -xc /dev/null && readelf -p '.comment' null.o
readelf: Warning: Section '.comment' was not dumped because it does not exist!
So far, rustc
only does it for WebAssembly targets and only when debug info is enabled:
$ echo 'fn main(){}' | rustc --target=wasm32-unknown-unknown --emit=llvm-ir -Cdebuginfo=2 - && grep llvm.ident rust_out.ll
!llvm.ident = !{!27}
The RFC part of this PR is about which behavior should rustc
follow:
- Always add it.
- Add it by default, i.e. have an opt-out flag (GCC, Clang).
- Have an opt-in flag.
- Never add it (current).
There is also the question of whether debug info being enabled matters for that decision, given the current behavior of WebAssembly targets.
For instance, adding it by default gets us closer to other popular compilers, but that may surprise some users with an information leak. The most conservative option is to only do so opt-in, even if debug info is enabled (some users may be stripping debug info and not expecting something else to be leaked elsewhere).
Implementation-wise, this covers both ModuleLlvm::new()
and ModuleLlvm::new_metadata()
cases by moving the addition to context::create_module
and adds a few test cases.
ThinLTO also sees the llvm.ident
named metadata duplicated (in temporary outputs), so this deduplicates it like it is done for wasm.custom_sections
. The tests also check this duplication does not take place.
r? @lcnr
(rust-highfive has picked a reviewer for you, use r? to override)
The job x86_64-gnu-llvm-12
failed! Check out the build log: (web) (plain)
Click to see the possible cause of the failure (guessed by this bot)
error: make failed
status: exit status: 2
command: "make"
--- stdout -------------------------------
# `-Ccodegen-units=16 -Copt-level=2` is used here to trigger thin LTO
# across codegen units to test deduplication of the named metadata
# (see `LLVMRustPrepareThinLTOImport` for details).
echo 'fn main(){}' | LD_LIBRARY_PATH="/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident:/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/lib" '/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident -L /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident - --emit=link,obj -Csave-temps -Ccodegen-units=16 -Copt-level=2
# `llvm-dis` is used here since `--emit=llvm-ir` does not emit LLVM IR
# for temporary outputs.
"/usr/lib/llvm-12/bin/llvm-dis" /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/llvm-ident/llvm-ident/*.bc
--- stderr -------------------------------
--- stderr -------------------------------
warning: ignoring emit path because multiple .o files were produced
warning: 1 warning emitted
llvm-dis: Too many positional arguments specified!
Can specify at most 1 positional arguments: See: /usr/lib/llvm-12/bin/llvm-dis --help
make: *** [Makefile:12: all] Error 1
failures:
I'm surprised this wasn't already added.
For instance, adding it by default gets us closer to other popular compilers, but that may surprise some users with an information leak.
I am pretty sure it is already possible to reliably identify the exact rustc version used soly based on the standard library functions linked into the executable. Without stripping you can trivially lookup the stable crate id from a libcore symbol name (the stable crate id depends among other things on the rustc version) and even without symbols I'm pretty sure there are enough differences between versions to identify which rustc is used. In any case it will be possible to at least identify that rustc is used if you don't restrict yourself to #![no_std]
and avoid calling every non-trivial function in libcore and enable LTO.
Without stripping
Yeah, I am just trying to look at it from the shoes of users working on proprietary software where it is common to strip it and clean/obfuscate/tweak binaries in further ways. Some of them may suddenly realize their executable is now explicitly marked and come to complain about it (even if they should be checking for extraneous sections anyway).
even without symbols
Definitely, I am sure it is possible to find fancier ways. I wouldn't be surprised if NNs are being applied to the problem already. :)
r? @bjorn3
Considering that this is annotated as [RFC], maybe you should open a PR to https://github.com/rust-lang/rfcs/ instead? Don't know how complex of a decision this is, so maybe a compiler team MCP is enough?
(@bjorn3 To be clear, I wouldn't mind doing it by default and getting closer to GCC and Clang; I am just playing devil's advocate)
Considering that this is annotated as [RFC], maybe you should open a PR to https://github.com/rust-lang/rfcs/ instead? Don't know how complex of a decision this is, so maybe a compiler team MCP is enough?
Definitely, if it is deemed necessary, I can write one. I wasn't sure, so I tagged it here for the moment. Thanks!
FWIW, rustc is already setting the producer string and adding it to debuginfo: https://github.com/rust-lang/rust/blob/0f06824013761ed6829887019033f1001e68f623/compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs#L813-L816
This is what you get when you run strings on rustc output that has debuginfo:
$ echo 'fn main(){}' | rustc -Cdebuginfo=2 - && strings rust_out | rg "rustc version"
clang LLVM (rustc version 1.62.0-beta.2 (daf68b1f7 2022-05-19))
This is what you get when you run strings on rustc output that has debuginfo:
Yeah, that is DW_AT_producer
; however sometimes the debug info may not be available (this came up as a possibility to know whether an object file came from C or Rust sources; with DWARF one may directly look into DW_AT_language
instead for this purpose). It is also good to be consistent with the other compilers even if debug info is available in case some tooling looks for .comment
.
Maybe I missed the motivation for doing this:
- consistency with clang and gcc
- you have a use case: I need it for the linux kernel?
- something else
It is quite useful for easily figuring out which rustc version an executable was built with. This could for example be used to determine that it needs to be rebuilt if the rustc version it was built with has a CVE, but there are many more reasons to need which compiler version an executable was built with.
That does indeed sound useful.
- consistency with clang and gcc
- you have a use case: I need it for the linux kernel?
To clarify a bit on those two you mention:
-
Tooling (and/or users) may expect or take advantage of the
.comment
section (e.g. it would be one less difference between C and Rust object files in the kernel, there is the security monitoring point from @bjorn3, etc.).Thus being consistent with GCC and Clang/LLVM is likely a good idea for those reasons (i.e. being consistent has other benefits; it is not just for the sake of consistency).
-
The kernel could use it (in principle) to distinguish C vs. Rust kernel modules when there is no debug info around. One can achieve the same in other ways, though.
I do not want to say "we need it", because we might end up not using it, but it would be nice to have for the other reasons.
Another point of view (not exactly a motivation, but related) is that one could argue it should be left up to the backend, e.g. if LLVM usually does it (when given the information), then it sounds reasonable to let it do it. rustc
could still provide a way for users to opt-out that backends should respect. Maybe target specs could carry that info too. (cc @antoyo for GCC).
Sorry. I believe consistency is it should be named .comment and not idk .version.
The real question is: is it helpful for rust users? I believe so.
Given that this is a public-facing change, a FCP is probably more appropriate?
I don't have permission to start an fcp. Also should this be a compiler or lang team fcp?
:umbrella: The latest upstream changes (presumably #96285) made this pull request unmergeable. Please resolve the merge conflicts.
Discussed in T-compiler meeting.
@rfcbot fcp merge
Team member @pnkfelix has proposed to merge this. The next step is review by the rest of the tagged team members:
- [x] @Aaron1011
- [x] @cjgillot
- [x] @davidtwco
- [x] @eddyb
- [x] @estebank
- [x] @lcnr
- [x] @matthewjasper
- [x] @michaelwoerister
- [x] @nagisa
- [ ] @nikomatsakis
- [x] @oli-obk
- [x] @petrochenkov
- [x] @pnkfelix
- [x] @wesleywiser
No concerns currently listed.
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
:bell: This is now entering its final comment period, as per the review above. :bell:
Yeah, I am just trying to look at it from the shoes of users working on proprietary software where it is common to strip it and clean/obfuscate/tweak binaries in further ways.
Hey, that's me! I suppose rust won't have an option to toggle this, and we'll have to use objcopy
to remove the section?
You can only use strip --remove-section=.comment
I think. You have to do this anyway to remove info about the linker that was used and which C compiler was used.
you started a FCP, but it is not obvious which option you are proposing should be selected:
The RFC part of this PR is about which behavior should
rustc
follow:
- Always add it.
- Add it by default, i.e. have an opt-out flag (GCC, Clang).
- Have an opt-in flag.
- Never add it (current).
you started a FCP, but it is not obvious which option you are proposing should be selected:
The RFC part of this PR is about which behavior should
rustc
follow:
- Always add it.
- Add it by default, i.e. have an opt-out flag (GCC, Clang).
- Have an opt-in flag.
- Never add it (current).
I didn't start the FCP, but personally I would go for one of the first two for consistency with GCC and Clang as long as the release notes mention the new section being added by default so that users are aware of it.
The PR implements the first, which is the simplest -- if that one goes in, the opt-out flag can always be added later if enough users claim the need for it.
The final comment period, with a disposition to merge, as per the review above, is now complete.
As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.
This will be merged soon.
For future reference: rustc has always embedded its version in the .rustc
section. If you need to determine the compiler version for older binaries (built before this RP shipped), use the .rustc
section instead.
This is the code responsible for embedding it:
https://github.com/rust-lang/rust/blob/2e35f954ada0f0c777844dc4fa66684efe90a035/compiler/rustc_codegen_ssa/src/back/metadata.rs#L262-L314
For future reference: rustc has always embedded its version in the .rustc section.
This section only exists in rust dylibs and proc macros. It does not exist in executables and cdylibs.
I see. The fallback for older versions would be parsing debuginfo, then, as described here.
Addressed @bjorn3's comment and moved use libc::c_uint;
to nearby std::
ones, like in metadata.rs
which the commit touches (it isn't entirely consistent across all files, but seemed better).
@bors r+