syzkaller icon indicating copy to clipboard operation
syzkaller copied to clipboard

pkg/report: syzkaller did not recognise the `not syncing: panic_on_warn set ...` report as corrupted

Open a-nogikh opened this issue 3 years ago • 4 comments

Dashboard link: https://syzkaller.appspot.com/bug?extid=8d41ad9c88279d71f7c9

Syzkaller should have recognised this report as corrupted: https://github.com/google/syzkaller/blob/3ce60af85b4644bd7f2d2ef01a7d2b9a41545b76/pkg/report/linux.go#L1907-L1912

Running linux.Parse against the log actually results in a report with Corrupted = true. But still the bug was not put into the corrupted report bucket on syzbot.

Log:

[   65.286679][   T22] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   65.286692][   T22] RIP: 0010:kvm_fastop_exception+0xf6a/0x1058
[   65.286719][   T22] Code: d3 ed e9 ef d4 6e f8 49 8d 0e 48 83 e1 f8 4c 8b 21 41 8d 0e 83 e1 07 c1 e1 03 49 d3 ec e9 45 e2 6e f8 49 8d 4d 00 48 83 e1 f8 <4c> 8b 21 41 8d 4d 00 83 e1 07 c1 e1 03 49 d3 ec e9 35 ec 6e f8 bd
[   65.286742][   T22] RSP: 0018:ffffc90000dcfae8 EFLAGS: 00010282
[   65.286762][   T22] RAX: 0000003361736376 RBX: ffff88806f1e3068 RCX: ffff88823bd14020
[   65.286779][   T22] RDX: ffffed100de3c614 RSI: 0000000000000005 RDI: 0000000000000007
[   65.286795][   T22] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88806f1e3098
[   65.286810][   T22] R10: ffffed100de3c613 R11: 0000000000000000 R12: ffff88823bd14020
[   65.286826][   T22] R13: ffff88823bd14020 R14: ffff88806f1e3098 R15: dffffc0000000000
[   65.286843][   T22] FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
[   65.286866][   T22] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.286883][   T22] CR2: ffff88823bd14020 CR3: 000000007253a000 CR4: 00000000001506f0
[   65.286899][   T22] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   65.286914][   T22] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   65.286929][   T22] Call Trace:
[   65.286941][   T22]  d_lookup+0xd8/0x170
[   65.286975][   T22]  lookup_dcache+0x1e/0x130
[   65.287001][   T22]  __lookup_hash+0x29/0x180
[   65.287028][   T22]  kern_path_locked+0x17e/0x320
[   65.287056][   T22]  ? filename_lookup+0x80/0x80
[   65.287089][   T22]  handle_remove+0xa2/0x5fe
[   65.287117][   T22]  ? cacheinfo_cpu_online.cold+0x3e/0x3e
[   65.287148][   T22]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[   65.287176][   T22]  ? finish_task_switch.isra.0+0x232/0xa50
[   65.287207][   T22]  ? find_held_lock+0x2d/0x110
[   65.287231][   T22]  ? devtmpfsd+0xaa/0x2a3
[   65.287256][   T22]  ? lock_downgrade+0x6e0/0x6e0
[   65.287279][   T22]  ? do_raw_spin_lock+0x120/0x2b0
[   65.287305][   T22]  ? rwlock_bug.part.0+0x90/0x90
[   65.287333][   T22]  devtmpfsd+0x1b9/0x2a3
[   65.287357][   T22]  ? dmar_validate_one_drhd+0x24d/0x24d
[   65.287384][   T22]  kthread+0x3e5/0x4d0
[   65.287409][   T22]  ? set_kthread_struct+0x130/0x130
[   65.287439][   T22]  ret_from_fork+0x1f/0x30
[   65.287469][   T22] ==================================================================
[   65.287478][   T22] Kernel panic - not syncing: panic_on_warn set ...
[   65.287727][   T22] CPU: 0 PID: 22 Comm: kdevtmpfs Not tainted 5.14.0-syzkaller #0
[   65.287754][   T22] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[   65.287769][   T22] Call Trace:
[   65.287778][   T22]  dump_stack_lvl+0xcd/0x134
[   65.287808][   T22]  panic+0x2b0/0x6dd
[   65.287832][   T22]  ? __warn_printk+0xf3/0xf3
[   65.287869][   T22]  ? kfence_report_error.cold+0x2f3/0xa56
[   65.287905][   T22]  kfence_report_error.cold+0x304/0xa56
[   65.287934][   T22]  ? kvm_fastop_exception+0xf6a/0x1058
[   65.288036][   T22]  ? kfence_print_object+0x350/0x350
[   65.288068][   T22]  ? add_lock_to_list.constprop.0+0x185/0x370
[   65.288092][   T22]  ? kvm_fastop_exception+0xf6a/0x1058
[   65.288116][   T22]  ? d_lookup+0xd8/0x170
[   65.288141][   T22]  ? lookup_dcache+0x1e/0x130
[   65.288167][   T22]  ? __lookup_hash+0x29/0x180
[   65.288192][   T22]  ? kern_path_locked+0x17e/0x320
[   65.288216][   T22]  ? handle_remove+0xa2/0x5fe
[   65.288243][   T22]  ? devtmpfsd+0x1b9/0x2a3
[   65.288267][   T22]  ? kthread+0x3e5/0x4d0
[   65.288294][   T22]  ? ret_from_fork+0x1f/0x30
[   65.288370][   T22]  ? mark_lock+0xef/0x17b0
[   65.288411][   T22]  kfence_handle_page_fault+0x203/0x630
[   65.288449][   T22]  page_fault_oops+0x1c5/0x6b0
[   65.288473][   T22]  ? __sanitizer_cov_trace_const_cmp2+0x22/0x80
[   65.288504][   T22]  ? is_prefetch.constprop.0+0xb7/0x530
[   65.288532][   T22]  ? dump_pagetable+0xc0/0xc0
[   65.288558][   T22]  ? pgtable_bad+0x90/0x90
[   65.288588][   T22]  ? search_bpf_extables+0x1d1/0x2f0
[   65.288614][   T22]  ? kvm_fastop_exception+0xf6a/0x1058
[   65.288648][   T22]  kernelmode_fixup_or_oops+0x219/0x280
[   65.288681][   T22]  __bad_area_nosemaphore+0x36d/0x400
[   65.288710][   T22]  ? spurious_kernel_fault+0x38d/0x5b0
[   65.288742][   T22]  do_kern_addr_fault+0x5b/0x70
[   65.288772][   T22]  exc_page_fault+0x155/0x180
[   65.288802][   T22]  asm_exc_page_fault+0x1e/0x30
[   65.288830][   T22] RIP: 0010:kvm_fastop_exception+0xf6a/0x1058
[   65.288858][   T22] Code: d3 ed e9 ef d4 6e f8 49 8d 0e 48 83 e1 f8 4c 8b 21 41 8d 0e 83 e1 07 c1 e1 03 49 d3 ec e9 45 e2 6e f8 49 8d 4d 00 48 83 e1 f8 <4c> 8b 21 41 8d 4d 00 83 e1 07 c1 e1 03 49 d3 ec e9 35 ec 6e f8 bd
[   65.288882][   T22] RSP: 0018:ffffc90000dcfae8 EFLAGS: 00010282
[   65.288904][   T22] RAX: 0000003361736376 RBX: ffff88806f1e3068 RCX: ffff88823bd14020
[   65.288922][   T22] RDX: ffffed100de3c614 RSI: 0000000000000005 RDI: 0000000000000007
[   65.288939][   T22] RBP: 0000000000000005 R08: 0000000000000000 R09: ffff88806f1e3098
[   65.289042][   T22] R10: ffffed100de3c613 R11: 0000000000000000 R12: ffff88823bd14020
[   65.289063][   T22] R13: ffff88823bd14020 R14: ffff88806f1e3098 R15: dffffc0000000000
[   65.289100][   T22]  ? __d_lookup+0x372/0x720
[   65.289143][   T22]  d_lookup+0xd8/0x170
[   65.289174][   T22]  lookup_dcache+0x1e/0x130
[   65.289204][   T22]  __lookup_hash+0x29/0x180
[   65.289236][   T22]  kern_path_locked+0x17e/0x320
[   65.289266][   T22]  ? filename_lookup+0x80/0x80
[   65.289309][   T22]  handle_remove+0xa2/0x5fe
[   65.289343][   T22]  ? cacheinfo_cpu_online.cold+0x3e/0x3e
[   65.289380][   T22]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[   65.289408][   T22]  ? finish_task_switch.isra.0+0x232/0xa50
[   65.289449][   T22]  ? find_held_lock+0x2d/0x110
[   65.289479][   T22]  ? devtmpfsd+0xaa/0x2a3
[   65.289506][   T22]  ? lock_downgrade+0x6e0/0x6e0
[   65.289533][   T22]  ? do_raw_spin_lock+0x120/0x2b0
[   65.289563][   T22]  ? rwlock_bug.part.0+0x90/0x90
[   65.289612][   T22]  devtmpfsd+0x1b9/0x2a3
[   65.289643][   T22]  ? dmar_validate_one_drhd+0x24d/0x24d
[   65.289676][   T22]  kthread+0x3e5/0x4d0
[   65.289703][   T22]  ? set_kthread_struct+0x130/0x130
[   65.289736][   T22]  ret_from_fork+0x1f/0x30
[   65.291109][   T22] Kernel Offset: disabled

a-nogikh avatar Sep 13 '21 11:09 a-nogikh

These are not put into "corrupted report" bucket because they have reproducers. The idea was that we don't want to lose reproducers and it still may be actionable. But I was confused about "why this corrupted report is reported" in the past as well.

The question is: why the KFENCE output is systematically truncated? @melver We also have this logic that should try to obtain a non-corrupted report for reproducers if possible: https://github.com/google/syzkaller/blob/master/pkg/repro/repro.go#L164-L175 (maybe it's broken? it's not unit-tested)

dvyukov avatar Sep 15 '21 05:09 dvyukov

Another bug where KFENCE report is truncated: https://syzkaller.appspot.com/bug?extid=ca31fd785c3041f508e4

dvyukov avatar Sep 15 '21 18:09 dvyukov

This is strange -- are the KFENCE reports too long? Perhaps the dumping of registers confuses something?

melver avatar Sep 15 '21 18:09 melver

We save up to 3MB of output (or something similar). And in either case we should preserve the bug title line (if it would be in the output).

I think I may see what's happening. Though, it's interesting that it manifested only on KFENCE reports as it should affect all reports.

All these reports happen very early (around 60-th second) and there is no fuzzer output yet. I suspect the bug happens right around the time we start syz-fuzzer and just connect to the serial console. And we happen to start reading the output in the middle of KFENCE report.

I think this API call returns the new console output: https://github.com/google/syzkaller/blob/07e953c105af057cb474bc086f68fb7ec5b241ec/vm/gce/gce.go#L224-L225

There is also this .reply-lines arg that allow to fetch previous output: https://github.com/google/syzkaller/blob/07e953c105af057cb474bc086f68fb7ec5b241ec/vm/gce/gce.go#L400-L401 Maybe we should do something like .reply-lines=1000 to get any report that is being currently printed.

... but then we will immediately detect an "unexpected reboot" crash... so it's not that simple.

Maybe it will help it we switch vm/gce to the scheme used by vm/qemu. Namely start reading console output right from the boot, rather than start reading it only when we start syz-fuzzer.

Overall there is too much duplication and too many differences between vm impls (for no reason). There is #607 to provide some framework for vm impls where they only need to provide implementation of primitive operations (like get console output).

dvyukov avatar Sep 15 '21 19:09 dvyukov