celluloid icon indicating copy to clipboard operation
celluloid copied to clipboard

Celluloid taking long time to open, stuck waiting for nvidia driver

Open Pesc0 opened this issue 2 years ago • 7 comments

Overview Description: When a nvidia gpu gets reserved for vfio passtrough celluloid takes a very long time to open. This apparently also happens if the nvidia driver is misconfigured, see #707.

Steps to Reproduce:

  1. Reserve gpu for passtrough, without uninstalling nvidia drivers: add vfio-pci.ids=10de:1b81,10de:10f0 to linux cmdline options.

Actual Results: Celluloid takes a long time to launch

$ strace celluloid
...
close(18)                               = 0
geteuid()                               = 1000
stat("/usr/bin/nvidia-modprobe", {st_mode=S_IFREG|S_ISUID|0755, st_size=39232, ...}) = 0
geteuid()                               = 1000
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0f39c68810) = 254770
wait4(254770, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 254770
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=254770, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
openat(AT_FDCWD, "/proc/modules", O_RDONLY) = 18
newfstatat(18, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
...

...
close(18)                               = 0
geteuid()                               = 1000
stat("/usr/bin/nvidia-modprobe", {st_mode=S_IFREG|S_ISUID|0755, st_size=39232, ...}) = 0
geteuid()                               = 1000
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f0f39c68810) = 254801
wait4(254801, 

It seems to get stuck on wait4, gets unstuck fairly quickly, but this gets retried many times, leading to a long launch time. I strongly suspect its a nvidia thing because nvidia-smi also takes a good second (similar in time as the wait4 call), then fails with:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

After launch everything works perfectly fine, so functionality is not compromised by this apparent misconfiguration. Mpv gets launched almost instantly and is not affected by this.

Expected Results: quick launch Version: Celluloid 0.24

Pesc0 avatar Nov 08 '22 23:11 Pesc0

If you run celluloid --mpv-gpu-hwdec-interop=no, does it open quickly?

gnome-mpv avatar Nov 09 '22 01:11 gnome-mpv

Still takes about 2 seconds but definitely a lot faster. Running strace on it i can see it gets stuck on the same exact call, except it doesn't retry a bunch of times.

Pesc0 avatar Nov 09 '22 16:11 Pesc0

facing similar problem

  • Using Nvidia driver 525.89.02. Opening Celluloid gets stuck every time and celluloid --mpv-gpu-hwdec-interop=no do help.
  • nvidia-smi works on my machine.
  • A full gdb traceback is available at gdb.txt, notice I am using celluloid from fedora system repo.

But my strace stuck on a futex

karuboniru avatar Feb 18 '23 09:02 karuboniru

--mpv-gpu-hwdec-interop=vaapi worked for me

image

Explanation

libmpv has a different GPU hwdec interop loading logic. By default, mpv tries to load interop context on demand while libmpv tries to load all available interop contexts.

This causes Celluloid waiting for Nvidia card resumes from D3cold state / wait for the Nvidia GPU to become online.

Thus, add mpv-gpu-hwdec-interop=vaapi and specify VK_ICD_FILENAMES environment variable (if you are using gpu-api=vulkan) should fix this issue.

Verifying

watch cat /sys/class/drm/renderD12*/device/power_state

There should be a line containing D3cold

For anyone interested in my Celluloid config file, see https://github.com/Kimiblock/moeOS.config/blob/master/usr/share/moeOS-Docs/Celluloid.d/celluloid.options

Kimiblock avatar Aug 24 '23 07:08 Kimiblock

Same problem for Amd Ryzen 7840HS, the Phoenix 780M integrated GPU.

How could I apply the argument by default for desktop integration? And is it possible to make libmpv just works like mpv and use the default mpv.conf rather than a standalone logic?


Just find that gpu-hwdec-interop is a standalone mpv config option, and can be configured in mpv.conf, and takes higher priority over hwdec

rtgiskard avatar Nov 26 '23 06:11 rtgiskard

@rtgiskard You can configure Celluloid to load an mpv.conf file under Preferences -> Config Files. You can select the mpv.conf you're using for mpv if you already have one.

gnome-mpv avatar Nov 26 '23 07:11 gnome-mpv

Same problem for Amd Ryzen 7840HS, the Phoenix 780M integrated GPU.

How could I apply the argument by default for desktop integration? And is it possible to make libmpv just works like mpv and use the default mpv.conf rather than a standalone logic?

Just find that gpu-hwdec-interop is a standalone mpv config option, and can be configured in mpv.conf, and takes higher priority over hwdec

DConf will do the job.

See https://github.com/Kimiblock/moeOS.config/blob/master/etc/dconf/db/local.d/04-Celluloid-VAAPI

Kimiblock avatar Nov 26 '23 22:11 Kimiblock