drakvuf icon indicating copy to clipboard operation
drakvuf copied to clipboard

Anyone experienced stalled CPU messages?

Open aoshiken opened this issue 7 years ago • 3 comments

Hi, Do anyone experienced stalled CPU messages from time to time when running multiple analysis in parallel?

Something along the lines...

kernel: [12912.643298] INFO: rcu_sched self-detected stall on CPU { 17}  (t=5250 jiffies g=126982 c=126981 q=107473)
kernel: [12912.644896] NMI backtrace for cpu 17
kernel: [12912.644900] CPU: 17 PID: 8684 Comm: drakvuf Not tainted 3.16.0-4-amd64 #1 Debian 3.16.43-2+deb8u5
kernel: [12912.644904] task: ffff8809fa64f430 ti: ffff8809fc9fc000 task.ti: ffff8809fc9fc000
kernel: [12912.644907] RIP: e030:[<ffffffff8100130a>]  [<ffffffff8100130a>] xen_hypercall_vcpu_op+0xa/0x20
kernel: [12912.644913] RSP: e02b:ffff880cce623cb8  EFLAGS: 00000046
kernel: [12912.644915] RAX: 0000000000000000 RBX: 0000000000000011 RCX: ffffffff8100130a
kernel: [12912.644918] RDX: 0000000000000000 RSI: 0000000000000011 RDI: 000000000000000b
kernel: [12912.644920] RBP: ffffffff818e3060 R08: ffffffff818e2b40 R09: 000000000000331c
kernel: [12912.644923] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff818e2b40
kernel: [12912.644926] R13: 0000000000000005 R14: 000000000001a3d1 R15: ffffffff818539c0
kernel: [12912.644937] FS:  00007fa401aa2780(0000) GS:ffff880cce620000(0000) knlGS:0000000000000000
kernel: [12912.644939] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: [12912.644945] CR2: 000000000d97fa90 CR3: 00000008501da000 CR4: 0000000000042660

I'm just starting to nailing down the issue but any info would be appreciated. :)

Regards

aoshiken avatar Nov 28 '17 11:11 aoshiken

Running multiple analysis in parallel has caused issues to me in the past. I'm not sure if it was this exact problem though.

tklengyel avatar Nov 28 '17 15:11 tklengyel

Just in case I'm going to reword again my previous explanation: I'm using multiple drakvufs in parallel with multiple xl restores and multiple VMI (without drakvuf) via vmifs.

The main issue I'm facing right now is random reboots performed by the watchdog:

[...]
(XEN) Watchdog timer fired for domain 0
(XEN) Hardware Dom0 shutdown: watchdog rebooting machine
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

I'm currently getting out of ideas, tested Xen v4.9.1 but ending up at the same door... :(

aoshiken avatar Nov 28 '17 15:11 aoshiken

Running xl restore in parallel is problematic in my experience. While the toolstack should serialize it itself with a lock, I found it best to serialize it manually.

tklengyel avatar Nov 28 '17 15:11 tklengyel