drakvuf
drakvuf copied to clipboard
Anyone experienced stalled CPU messages?
Hi, Do anyone experienced stalled CPU messages from time to time when running multiple analysis in parallel?
Something along the lines...
kernel: [12912.643298] INFO: rcu_sched self-detected stall on CPU { 17} (t=5250 jiffies g=126982 c=126981 q=107473)
kernel: [12912.644896] NMI backtrace for cpu 17
kernel: [12912.644900] CPU: 17 PID: 8684 Comm: drakvuf Not tainted 3.16.0-4-amd64 #1 Debian 3.16.43-2+deb8u5
kernel: [12912.644904] task: ffff8809fa64f430 ti: ffff8809fc9fc000 task.ti: ffff8809fc9fc000
kernel: [12912.644907] RIP: e030:[<ffffffff8100130a>] [<ffffffff8100130a>] xen_hypercall_vcpu_op+0xa/0x20
kernel: [12912.644913] RSP: e02b:ffff880cce623cb8 EFLAGS: 00000046
kernel: [12912.644915] RAX: 0000000000000000 RBX: 0000000000000011 RCX: ffffffff8100130a
kernel: [12912.644918] RDX: 0000000000000000 RSI: 0000000000000011 RDI: 000000000000000b
kernel: [12912.644920] RBP: ffffffff818e3060 R08: ffffffff818e2b40 R09: 000000000000331c
kernel: [12912.644923] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff818e2b40
kernel: [12912.644926] R13: 0000000000000005 R14: 000000000001a3d1 R15: ffffffff818539c0
kernel: [12912.644937] FS: 00007fa401aa2780(0000) GS:ffff880cce620000(0000) knlGS:0000000000000000
kernel: [12912.644939] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [12912.644945] CR2: 000000000d97fa90 CR3: 00000008501da000 CR4: 0000000000042660
I'm just starting to nailing down the issue but any info would be appreciated. :)
Regards
Running multiple analysis in parallel has caused issues to me in the past. I'm not sure if it was this exact problem though.
Just in case I'm going to reword again my previous explanation: I'm using multiple drakvufs
in parallel with multiple xl restores
and multiple VMI (without drakvuf) via vmifs
.
The main issue I'm facing right now is random reboots performed by the watchdog:
[...]
(XEN) Watchdog timer fired for domain 0
(XEN) Hardware Dom0 shutdown: watchdog rebooting machine
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
I'm currently getting out of ideas, tested Xen v4.9.1 but ending up at the same door... :(
Running xl restore
in parallel is problematic in my experience. While the toolstack should serialize it itself with a lock, I found it best to serialize it manually.