gllvm Capturing the command-line arguments used on each translation unit

Hi there,

Does gllvm support capturing the command-line arguments (not underlying driver arguments) used on each translation unit?

For example, if I had the following runs:

gclang -o foo.o -flag1 -flag2 -flag3 foo.c
gclang -o bar.o -flag1 -flag2 -flag4 bar.c
gclang -o bar -lwhatever foo.o bar.o

I'd like the following mapping stored in a section stored somewhere in bar:

foo.o = clang -o foo.o -flag1 -flag2 -flag3 foo.c
bar.o = clang -o bar.o -flag1 -flag2 -flag4 bar.c
bar = clang -o bar -lwhatever foo.o bar.o

I'm aware that I can approximate this at the clang/LLVM level with -grecord-gcc-switches or -frecord-command-line, but was curious if I could do the same at the gllvm level.

This is something I could try to contribute, if there's interest.

Apr 08 '20 19:04 woodruffw

So you want to create another section that contains the commands used to generate the compilation unit?

I guess that is possible, but isn't in the code. It shouldn't be too hard, after all that is how the bitcode is "recorded".

There has been occasional mumbling about needing something like this.

Apr 08 '20 19:04 ianamason

So you want to create another section that contains the commands used to generate the compilation unit?

Yep, exactly. And yeah, I figure I could reuse the current section techniques/code to stash it.

I'll look into it a bit.

Apr 08 '20 19:04 woodruffw

And I guess an additional switch to get-bc that dumps the information out to a file, like the manifest switch does.

Sounds reasonable. I remember @HassenSaidi complained that we lost the necessary information to relink the bitcode.

Apr 08 '20 20:04 ianamason

@ianamason @woodruffw : I did run into this issue in the past. I had more complex scenarios involving changes to the .o files between their creation and the linking. Imagine for instance changing the name of a symbol between generating the .o file and linking it. So to do this properly, the section containing the commands should be generated by tracking all file changes during the build process.

Apr 20 '20 21:04 HassenSaidi

So you gave up on this and created that: https://github.com/trailofbits/blight. I'll leave this here, so others can follow.

Jun 30 '21 17:06 ianamason

@woodruffw I took a look at blight, but I haven't tried it out. Is there any other black magic other than creating a directory containing your wrappers and sticking that directory at the front of the PATH? What happens when build systems do bad things like call hard coded paths to tools? Like /usr/bat/shit/crazy/clang?

Sep 08 '22 01:09 ianamason

@ianamason we use two techniques:

Most build systems respect CC, CXX, etc., so we simply point those to blight-cc, blight-c++, etc.
If that doesn't work (e.g. if a build hardcodes clang++ instead of using $(CC)), we do the $PATH trick you mentioned. In that case, /tmp/.../clang++ becomes a shim around blight-c++.

That leaves the worst case, i.e. a fully qualified path like /usr/bat/shit/crazy/clang. We don't handle those at all at the moment, since we (experimentally) haven't run into too many real world builds that actually do that. However, we could in theory handle those by tracing the child process's exec* family calls and looking for things that look like build tools. I believe that's what tools like bear do.

Sep 08 '22 02:09 woodruffw

Thanks! I thought I saw a discussion that cmake doesn't respect AR, is that right?

Sep 08 '22 02:09 ianamason

That sounds right, although I'm not 100% sure -- I know they have their own CMAKE_AR variable instead, but I'm not sure if that's the sole variable or whether it just takes precedence.

Sep 08 '22 02:09 woodruffw

gllvm gllvm copied to clipboard

Capturing the command-line arguments used on each translation unit

gllvm
gllvm copied to clipboard