ignite icon indicating copy to clipboard operation
ignite copied to clipboard

Host deadlock under load

Open lukemarsden opened this issue 4 years ago • 1 comments

I have an ignite host that's routinely hanging under load. Host becomes completely unresponsive, doesn't ping, SSH connections freeze and time out, no messages on ipmi console. Ubuntu 20.04. AMD EPYC CPU, 24 cores (I think) with 128GB RAM. Bare metal obviously.

Workload is rapidly starting and stopping many ignite VMs. I've added a mutex so that ignite run never runs concurrently with itself or ignite rm. Has anyone seen anything like this?

How do you advise to diagnose the issue? I'd enable kdump but the kernel never panics, just hangs...

lukemarsden avatar Oct 27 '20 21:10 lukemarsden

Just noting from our previous convo in slack a few weeks ago:

i also made the system which is controlling ignite less aggressive and haven't seen any crashes yet - if i get no crashes on this server with the code changes, i'll see if it still crashes on the old server

Reducing the concurrent vm startup or runtime seems to have resolved some of this deadlock problem

stealthybox avatar Nov 16 '20 18:11 stealthybox