compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

Zombie processes

Open pvelesko opened this issue 2 years ago • 6 comments

Is there a way to reload/reset the driver to get rid of these dead processes? These are a result of running various Level Zero tests overnight.

image

pvelesko avatar Sep 30 '23 06:09 pvelesko

Hi @pvelesko What are these processes, who spawned them?

JablonskiMateusz avatar Oct 06 '23 06:10 JablonskiMateusz

We have a large set of tests that we run on the Level Zero backend. Some of these tests produce these processes which I can't figure out how to kill. I tried unbinding dgpu from the i915 driver, I tried kill -9, but so far the only way I've found is to :

alias reboot_hardcore="history -a && sudo sh -c 'echo b > /proc/sysrq-trigger'"

A regular reboot doesn't work either as the system just hangs while waiting on something. Is there a way to reset the driver completely? Since we can't unload the driver while it's in use.

pvelesko avatar Oct 06 '23 08:10 pvelesko

Hello,

have you tried unbinding i915 like this ?

sudo sh -c "echo -n auto > /sys/bus/pci/devices/0000:00:02.0/power/control" sudo sh -c "echo -n "0000:00:02.0" > /sys/bus/pci/drivers/i915/unbind" sudo modprobe -r i915

PCI number can be found with:

lspci | grep -i display or lspci | grep -i vga

HoppeMateusz avatar Oct 06 '23 08:10 HoppeMateusz

I'll try this, thank you.

pvelesko avatar Oct 06 '23 08:10 pvelesko

modprobe: FATAL: Module i915 is in use. after unbinding devices.

pvelesko avatar Oct 07 '23 10:10 pvelesko

hello, thanks for a try, Unloading i915 might not work if KMD is hanging somewhere, reboot should be the ultimate way of resetting.

Can you provide details for reproducing the issue? What platform is it? OS version i915 version possibly a reproducer Thanks

HoppeMateusz avatar Oct 09 '23 07:10 HoppeMateusz