syzkaller
syzkaller copied to clipboard
Inline function related changes
The PR is to introduce inline function decoding from debug_info info compared to previous symbols from symbol table.
Meanwhile, some fixes to fit for the inline function change.
With these changes,
- module name + file name can be used as the uniq id for file mapping
- header files and inline functions can be distinguished.
Codecov Report
Merging #3144 (7e28df4) into master (e7f9308) will decrease coverage by
0.2%. The diff coverage is30.5%.
| Impacted Files | Coverage Δ | |
|---|---|---|
| pkg/cover/backend/backend.go | 0.0% <ø> (ø) |
|
| pkg/cover/backend/dwarf.go | 0.0% <0.0%> (ø) |
|
| pkg/cover/backend/elf.go | 0.0% <0.0%> (ø) |
|
| pkg/cover/report.go | 77.9% <71.9%> (-3.7%) |
:arrow_down: |
| pkg/cover/html.go | 67.0% <89.9%> (+6.9%) |
:arrow_up: |
| pkg/csource/csource.go | 76.5% <0.0%> (-4.2%) |
:arrow_down: |
| prog/mutation.go | 87.7% <0.0%> (-1.0%) |
:arrow_down: |
| prog/any.go | 83.5% <0.0%> (-1.0%) |
:arrow_down: |
| prog/rand.go | 90.6% <0.0%> (-0.8%) |
:arrow_down: |
| dashboard/app/main.go | 72.2% <0.0%> (-0.4%) |
:arrow_down: |
| ... and 11 more |
Hi Joey,
Please rename commit "qcom: show percent of covered pcs in funcs" to "pkg/cover: show percent of covered pcs in funcs".
I've reviewed only the first 2 commits for now: pkg/cover: show module in /funccover and /filecover qcom: show percent of covered pcs in funcs
They look good to me. The other look more complex. If you split these 2 commits into a separate PR, we can merge them sooner. It will help to make the remaining part more manageable.
pkg/cover: show module in /funccover and /filecover qcom: show percent of covered pcs in funcs
Split into https://github.com/google/syzkaller/pull/3157
I tried to compare the upstream syzkaller vs this PR using v5.18.1 Linux (0047d57e6c91177bb731bed5ada6c211868bc27c, compiled using this config with gcc 11.2.0)
I applied this simple patch to measure time.
For the current syzkaller code:
2022/06/03 18:31:51 initializing coverage information...
2022/06/03 18:31:53 MakeReportGenerator took 2.123130372s
2022/06/03 18:31:53 128209 PCs symbolized in 214.958776ms (avg. 1.676µs)
2022/06/03 18:31:56 VMs 4, executed 23995, cover 128335, signal 182206/183516, crashes 0, repro 0
<...>
2022/06/03 18:33:58 rg.prepareFileMap took 2m4.755647766s
For this PR something went wrong.
2022/06/03 18:46:15 initializing coverage information...
2022/06/03 18:46:25 VMs 4, executed 25617, cover 127498, signal 179347/181538, crashes 0, repro 0
2022/06/03 18:46:35 VMs 4, executed 26020, cover 127977, signal 180287/182510, crashes 0, repro 0
2022/06/03 18:46:45 VMs 4, executed 26931, cover 129257, signal 182880/185004, crashes 0, repro 0
2022/06/03 18:46:55 VMs 4, executed 27737, cover 130117, signal 184399/186491, crashes 0, repro 0
2022/06/03 18:47:05 VMs 4, executed 28394, cover 130595, signal 185344/187480, crashes 0, repro 0
2022/06/03 18:47:15 VMs 4, executed 28867, cover 131216, signal 186593/188635, crashes 0, repro 0
2022/06/03 18:47:25 VMs 4, executed 29627, cover 132303, signal 188728/190766, crashes 0, repro 0
2022/06/03 18:47:35 VMs 4, executed 29938, cover 132855, signal 190436/191572, crashes 0, repro 0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x80ed9e]
goroutine 291076 [running]:
github.com/google/syzkaller/pkg/cover/backend.readAllInlinedSubroutines(0xc00073a180, 0x9b312a?, 0xc0aeb489c0)
/home/nogikh/syzkaller/pkg/cover/backend/dwarf.go:780 +0x23e
github.com/google/syzkaller/pkg/cover/backend.readSymbolsFromDwarf.func1.1(0x15c71b0?)
/home/nogikh/syzkaller/pkg/cover/backend/dwarf.go:829 +0x45
created by github.com/google/syzkaller/pkg/cover/backend.readSymbolsFromDwarf.func1
/home/nogikh/syzkaller/pkg/cover/backend/dwarf.go:828 +0x47b
dwarf
you are right, only tested linux kernel compiled by clang. let me check. I think gcc encodes into different gwarf info.
Verified gcc compiled kernel won't work, as the dwarf is encoded into different format.
DW_AT_sibling is used, and cap_task_setnice doesn't have range att
0x02ffd135: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("cap_task_setnice")
DW_AT_decl_line (1148)
DW_AT_prototyped (true)
DW_AT_type (0x02fe81c6 "int")
DW_AT_sibling (0x02ffd15d)
0x02ffd15d: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("cap_task_setioprio")
DW_AT_decl_line (1135)
DW_AT_prototyped (true)
DW_AT_type (0x02fe81c6 "int")
DW_AT_inline (DW_INL_inlined)
DW_AT_sibling (0x02ffd185)
0x02ffd185: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("cap_task_setscheduler")
DW_AT_decl_line (1122)
DW_AT_prototyped (true)
DW_AT_type (0x02fe81c6 "int")
DW_AT_low_pc (0xffffffff8067d190)
DW_AT_high_pc (0xffffffff8067d1a2)
DW_AT_frame_base (DW_OP_call_frame_cfa)
DW_AT_GNU_all_call_sites (true)
DW_AT_sibling (0x02ffd1d0)
0x02ffd1d0: DW_TAG_subprogram
DW_AT_name ("cap_safe_nice")
DW_AT_decl_line (1101)
DW_AT_prototyped (true)
DW_AT_type (0x02fe81c6 "int")
DW_AT_low_pc (0xffffffff8067d090)
DW_AT_high_pc (0xffffffff8067d181)
DW_AT_frame_base (DW_OP_call_frame_cfa)
DW_AT_GNU_all_call_sites (true)
DW_AT_sibling (0x02ffd52f)
0x02ffd52f: DW_TAG_array_type
DW_AT_type (0x02fe8156 "char")
DW_AT_sibling (0x02ffd53f)
0x02ffd53f: DW_TAG_subprogram
DW_AT_external (true)
DW_AT_name ("cap_task_fix_setuid")
DW_AT_decl_line (1052)
DW_AT_prototyped (true)
DW_AT_type (0x02fe81c6 "int")
DW_AT_low_pc (0xffffffff8067cad0)
DW_AT_high_pc (0xffffffff8067ce8f)
DW_AT_frame_base (DW_OP_call_frame_cfa)
DW_AT_GNU_all_call_sites (true)
DW_AT_sibling (0x02ffdb61)
Below shows clang compiled kernel can have range info.
grep cap_task_setnice -A 10 -B 10 dwarf-vmlinux.txt
0x01f50383: NULL
0x01f50384: NULL
0x01f50385: DW_TAG_subprogram
DW_AT_low_pc (0xffffffff80722c60)
DW_AT_high_pc (0xffffffff80722d19)
DW_AT_frame_base (DW_OP_reg7 RSP)
DW_AT_GNU_all_call_sites (true)
DW_AT_name ("cap_task_setnice")
DW_AT_decl_line (1148)
DW_AT_prototyped (true)
DW_AT_type (0x01f3fd2b "int")
DW_AT_external (true)
0x01f5039f: DW_TAG_formal_parameter
DW_AT_location (0x00ac9cff:
[0xffffffff80722c60, 0xffffffff80722c6d): DW_OP_reg5 RDI
[0xffffffff80722c6d, 0xffffffff80722c91): DW_OP_reg3 RBX
Syzkaller should definitely be able to adequately handle both gcc- and clang- compiled kernels.
Syzkaller should definitely be able to adequately handle both gcc- and clang- compiled kernels.
Right, I'm recoding some of the logic, it works for both gcc and clang now, but still need some tests before pushing.
@a-nogikh pushed the new code, please review again
Tested on linux master with both gcc-9, gcc-11, clang-14.
Hi Joey,
Yes, now I was able to run it. It's a bit slower (by ~15%, it seems, but I did not do many runs), which is not a big deal. The generated HTML file was also somewhat different.
And it's about the expected difference in behavior that I now want to ask you (yeah, that should have been my first question here, but better later than never :) ).
- What was your specific problem you were trying to solve with this PR?
- What difference should a person that explores the HTML coverage report notice? It would be great if you gave some specific examples.
Some examples that seemed a bit confusing to me.
- Different sets of non-covered coverage points (red)
Upstream syzkaller:
Syzkaller from your PR:

What made that possible?
- Different sets of covered coverage points (black)
Upstream:

Your PR:

Right now syzkaller anyway extracts the specific affected lines via addr2line, and that tool also shows to which inlined functions belongs each PC we ask it about. So, which changes in this PR made for the difference between these two screenshots?
@a-nogikh
- What was your specific problem you were trying to solve with this PR?
- The PR is mainly to solve PCs inside inline function and header files are not separated during /subsystemcover shown. So, header files and/or inline functions in header files can be used by multiple c files. If the header file is shared among different subsystems, we might not be care about coverage rate insider header files.
- And it also fixes confusing part of during coverage shown and pcs not selected into correct symbols in some corner cases in current buildSymbols.
- What difference should a person that explores the HTML coverage report notice? It would be great if you gave some specific examples. The coverage shown difference is mainly caused by the change below. I think we shouldn't filter out other frames with the same PC.
- uniqueFrames := make(map[uint64]bool)
- var finalFrames []backend.Frame
- for _, frame := range rg.Frames {
- if !uniqueFrames[frame.PC] {
- uniqueFrames[frame.PC] = true
- finalFrames = append(finalFrames, frame)
- }
- }
- rg.Frames = finalFrames
+ sort.Slice(rg.Frames, func(i, j int) bool {
+ return rg.Frames[i].PC < rg.Frames[j].PC
+ })
We know when addr2line decodes one PC which has multiple frames, we should show these multiple frames to explictly show that the function is hit too. The reason behind is for the inline function, it can have multiple callers, and multiple repensentation in dwarf section with different ranges. For example, inline function A is called by two normal functions B and C, and so A at least have range r[start1, end1] and r[start2, end2]. When one PC is decoded, it shows A is hitted in current syzkaller, but we don't know which normal function is called. By applying below fix, then we know it can be from B or C. I think it's more reasonable now for developer to understand more similiar way as gcov or lcov style.
For Some examples that seemed a bit confusing to me. Let me try to explain. By looking at below dwarf extraction, we know audit_log_start is a normal function, if any pc inside it is hit, it means the function is hit too. The current syzkaller cover page shows L1591 is hit, but we don't know if audit_log_start is hit and so we have to search the function to check. But with the PR, we don't need to do so. It's the same case for current_cred function.
0x0129b33f: DW_TAG_subprogram
DW_AT_low_pc (0xffffffff817fc940)
DW_AT_high_pc (0xffffffff817fcd0b)
DW_AT_frame_base (DW_OP_reg6 RBP)
DW_AT_call_all_calls (true)
DW_AT_name ("audit_log_multicast")
DW_AT_decl_file ("/local/mnt/workspace/xx/linux/kernel/audit.c")
DW_AT_decl_line (1584)
DW_AT_prototyped (true)
0x0129ce93: DW_TAG_subprogram
DW_AT_low_pc (0xffffffff817f3320)
DW_AT_high_pc (0xffffffff817f3ce5)
DW_AT_frame_base (DW_OP_reg6 RBP)
DW_AT_call_all_calls (true)
DW_AT_name ("audit_log_start")
DW_AT_decl_file ("/local/mnt/workspace/xx/linux/kernel/audit.c")
DW_AT_decl_line (1846)
DW_AT_prototyped (true)
DW_AT_type (0x0129cd57 "audit_buffer *")
DW_AT_external (true)
0x012b813a: DW_TAG_subprogram
DW_AT_name ("audit_log_start")
DW_AT_decl_file ("/local/mnt/workspace/xx/linux/./include/linux/audit.h")
DW_AT_decl_line (162)
DW_AT_prototyped (true)
DW_AT_type (0x012b8153 "audit_buffer *")
DW_AT_declaration (true)
DW_AT_external (true)
For audit_log_format, the same case for L1600. It's more reasonable to say there is a PC can be reachable from L1600.
0x0129d319: DW_TAG_subprogram
DW_AT_low_pc (0xffffffff817f3cf0)
DW_AT_high_pc (0xffffffff817f3e00)
DW_AT_frame_base (DW_OP_reg6 RBP)
DW_AT_call_all_calls (true)
DW_AT_name ("audit_log_format")
DW_AT_decl_file ("/local/mnt/workspace/jiangenj/linux/kernel/audit.c")
DW_AT_decl_line (1992)
DW_AT_prototyped (true)
DW_AT_external (true)
0xffffffff8176f315
get_current
arch/x86/include/asm/current.h:15 (discriminator 2)
audit_log_multicast
kernel/audit.c:1600 (discriminator 2)
Different sets of covered coverage points (black) You can see there is a PC at ffffffff8176c4e2. It should be colored not matter if it's hit.
0x022ba17a: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x022b82b3 "audit_ctl_owner_current")
DW_AT_entry_pc (0xffffffff8176c4e2)
DW_AT_GNU_entry_view (0x0000)
DW_AT_low_pc (0xffffffff8176c4e2)
DW_AT_high_pc (0xffffffff8176c4e7)
DW_AT_call_file ("/local/mnt/workspace/xx/linux.gcc/kernel/audit.c")
DW_AT_call_line (1868)
DW_AT_call_column (0x25)
DW_AT_sibling (0x022ba21a)
ffffffff8176c4dc: 0f 85 98 01 00 00 jne ffffffff8176c67a <audit_log_start.part.0+0x24a>
ffffffff8176c4e2: e8 99 18 03 00 callq ffffffff8179dd80 <__sanitizer_cov_trace_pc>
ffffffff8176c4e7: 48 39 2d 02 33 e7 0e cmp %rbp,0xee73302(%rip) # ffffffff905df7f0 <audit_cmd_mutex+0x90>
ffffffff8176c4ee: 0f 84 86 01 00 00 je ffffffff8176c67a <audit_log_start.part.0+0x24a>
Hope it answers all the doubts.