gllvm icon indicating copy to clipboard operation
gllvm copied to clipboard

gclang fail to identify input files inside linker groups

Open pietroborrello opened this issue 3 years ago • 7 comments

gclang does not seem to identify source input files that are listed inside a linker group.

Environment

  • gllvm version 1.3.0
  • go version go1.6.2
  • llvm version 10

To reproduce:

main.c:

#include <stdio.h>

int foo(int);

int main(int argc, char ** argv) {
    printf("%d\n", foo(argc));
}

lib.c:

int foo(int i) { 
    return i+1;
}

Building with gclang main.c -Wl,--start-group lib.c -Wl,--end-group -o main produces a valid binary. Enabling WLLVM_OUTPUT_LEVEL="DEBUG"

INFO:Entering CC [main.c -Wl,--start-group lib.c -Wl,--end-group -o main]
DEBUG:Compile using parsed arguments:
InputList:         [main.c -Wl,--start-group lib.c -Wl,--end-group -o main]
InputFiles:        [main.c]
ObjectFiles:       []
OutputFilename:    main
CompileArgs:       []
LinkArgs:          [-Wl,--start-group lib.c -Wl,--end-group]
ForbiddenFlags:    []
IsVerbose:         false
IsDependencyOnly:  false
IsPreprocessOnly:  false
IsAssembleOnly:    false
IsAssembly:        false
IsCompileOnly:     false
IsEmitLLVM:        false
IsLTO:             false
IsPrintOnly:       false

DEBUG:buildObjectFile: [main.c -c -o .main.c.o]
DEBUG:Calling execCmd(/usr/lib/llvm-10/bin/clang, [main.c -Wl,--start-group lib.c -Wl,--end-group -o main])
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [main.c -c -o .main.c.o] had exitCode 0
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [-emit-llvm -c main.c -o .main.c.o.bc] had exitCode 0
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [main.c -Wl,--start-group lib.c -Wl,--end-group -o main] had exitCode 0
DEBUG:attachBitcodePathToObject recognized .o as something it can inject into.
DEBUG:execCmd: objcopy [--add-section .llvm_bc=/tmp/gllvm479332637 .main.c.o] had exitCode 0
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [.main.c.o -Wl,--start-group lib.c -Wl,--end-group -o main] had exitCode 0
INFO:LINKING: /usr/lib/llvm-10/bin/clang [.main.c.o -Wl,--start-group lib.c -Wl,--end-group -o main]
DEBUG:Calling [gclang main.c -Wl,--start-group lib.c -Wl,--end-group -o main] returned 0

As you can see lib.c is not present in the InputFiles list.

Then executing WLLVM_OUTPUT_LEVEL="DEBUG" get-bc -b -S -o main.bc main

DEBUG:defaultPath = llvm-ar
DEBUG:envPath = 
DEBUG:usrPath = llvm-ar
DEBUG:path = /usr/lib/llvm-10/bin/llvm-ar
DEBUG:defaultPath = llvm-link
DEBUG:envPath = 
DEBUG:usrPath = llvm-link
DEBUG:path = /usr/lib/llvm-10/bin/llvm-link
INFO:
ea.Verbose:            false
ea.WriteManifest:      false
ea.SortBitcodeFiles:   false
ea.BuildBitcodeModule: true
ea.KeepTemp:           false
ea.LinkArgSize:        0
ea.InputFile:          main
ea.OutputFile:         main.bc
ea.LlvmArchiverName:   /usr/lib/llvm-10/bin/llvm-ar
ea.LlvmLinkerName:     /usr/lib/llvm-10/bin/llvm-link
ea.ArchiverName:       ar
ea.StrictExtract:      true
INFO:handleExecutable: artifactPaths = [/tmp/.main.c.o.bc]
INFO:argMax = 1887436
DEBUG:execCmd: /usr/lib/llvm-10/bin/llvm-link [-o main.bc /tmp/.main.c.o.bc] had exitCode 0
Bitcode file extracted to: main.bc.
INFO:Calling [get-bc -b -S -o main.bc main] DID NOT TELL US WHAT HAPPENED

The call does not fail but produces a bitcode that does not contain any definition for function foo or anything present in lib.c. I suspect this is due to gllvm parser just forwarding the linker group to the linker, skipping the bitcode generation phase for input files present there. The code I suspect being the culprit is here and testing in an older version of gllvm (version 1.2.7) does not show the bug.

I understand that does not really make sense to create a group like -Wl,--start-group lib.c -Wl,--end-group, but I tried to minimize it since the issue is present any time a source file is present in a group among any other library/archive, like for example in Android libhevc fuzzer build script

pietroborrello avatar May 19 '21 16:05 pietroborrello

I would have to read the manual, but what is a linker supposed to do with a .c file?

OK so A quick look tells me that the linker expects the files in a group to be archives or object files. So I think you/they are misusing the notion.

What does clang do in this case?

ianamason avatar May 19 '21 19:05 ianamason

I have to say that I share the feeling that this option should not be used with source files, but this seems actually valid. clang automatically builds a temporary object file from that source (/tmp/lib-51e45b.o in the following example) which is then passed to the linker:

$ clang-10 -v main.c -Wl,--start-group lib.c -Wl,--end-group -o main
Ubuntu clang version 10.0.1-++20210405103842+ef32c611aa21-1~exp1~20210405084441.211
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/4.8.5
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.5
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Candidate multilib: x32;@mx32
Selected multilib: .;@m64
 "/usr/lib/llvm-10/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name main.c -mrelocation-model static -mthread-model posix -mframe-pointer=all -fmath-errno -fno-rounding-math -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -fno-split-dwarf-inlining -debugger-tuning=gdb -v -resource-dir /usr/lib/llvm-10/lib/clang/10.0.1 -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-10/lib/clang/10.0.1/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdebug-compilation-dir /tmp/testg -ferror-limit 19 -fmessage-length 0 -fgnuc-version=4.2.1 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -faddrsig -o /tmp/main-b43ce0.o -x c main.c
clang -cc1 version 10.0.1 based upon LLVM 10.0.1 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/llvm-10/lib/clang/10.0.1/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
 "/usr/lib/llvm-10/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name lib.c -mrelocation-model static -mthread-model posix -mframe-pointer=all -fmath-errno -fno-rounding-math -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -dwarf-column-info -fno-split-dwarf-inlining -debugger-tuning=gdb -v -resource-dir /usr/lib/llvm-10/lib/clang/10.0.1 -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-10/lib/clang/10.0.1/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdebug-compilation-dir /tmp/testg -ferror-limit 19 -fmessage-length 0 -fgnuc-version=4.2.1 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -faddrsig -o /tmp/lib-51e45b.o -x c lib.c
clang -cc1 version 10.0.1 based upon LLVM 10.0.1 default target x86_64-pc-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include
 /usr/lib/llvm-10/lib/clang/10.0.1/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
 "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o main /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu/crt1.o /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu/crti.o /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/crtbegin.o -L/usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0 -L/usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../.. -L/usr/lib/llvm-10/bin/../lib -L/lib -L/usr/lib /tmp/main-b43ce0.o --start-group /tmp/lib-51e45b.o --end-group -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/crtend.o /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu/crtn.o

This behavior is also shared with gcc, even if I did not find it documented anywhere. Since this is done implicitly gclang misses the possibility to create the bc file and to add its path to the object

pietroborrello avatar May 19 '21 20:05 pietroborrello

Sigh.

Freeping creaturism.

So I suppose one would look at the files in the group, if any are source files, add them to the input files and replace their occurrence in the group by the soon to be created object file.

All without breaking anything :-)

ianamason avatar May 19 '21 20:05 ianamason

Yes exactly, this would work perfectly.

I think there is another inconsistency in how gllvm deals with linker groups, still used in Android libhevc fuzzer build script where there can be the occurrence of the -o <output> option inside the group itself, like -Wl,--start-group lib.c -o main -Wl,--end-group. Clang transparently moves it outside the linker group while compiling (to -o main -Wl,--start-group lib.c -Wl,--end-group), while gclang fails to parse it as the OutputFile since it forwards blindly the whole group, and appends a -o a.out that overrides the original output destination.

Again, I believe this should not be used this way since it seems there is no additional side effect other than it being moved out from the group, but I pointed it out here since it is used in a pretty important project selected as one of the benchmarks in FuzzBench.

Thanks for this amazing project btw :)

pietroborrello avatar May 19 '21 22:05 pietroborrello

OK @pietroborrello I will ping you when I have something to test. Hopefully within a week. It's not on the top of my stack at the moment.

ianamason avatar May 20 '21 14:05 ianamason

@pietroborrello Out of curiosity, did you see a real build use linker groups in this way? Or was this something you discovered experimentally?

woodruffw avatar Mar 02 '22 22:03 woodruffw

At the time of the issue, it was used in the Android libhvec fuzzer build script of the OSSFuzz project by Google

The script seems now to be updated to use cmake, so I'm not sure whether they are still using it

pietroborrello avatar Mar 02 '22 22:03 pietroborrello