smithay icon indicating copy to clipboard operation
smithay copied to clipboard

WIP Enable `IN_FENCE_FD` on Nvidia 560 driver

Open ids1024 opened this issue 1 year ago • 1 comments

In theory I think this should work. But I seem to be seeing issues with this on the 560 beta driver.

The 560 driver adds IN_FENCE_FD support, fixing https://github.com/NVIDIA/open-gpu-kernel-modules/issues/622, so the check added in https://github.com/Smithay/smithay/pull/1437 to disable IN_FENCE_FD on Nvidia should no longer be needed, if the driver is new enough. Reading /sys/module/nvidia_drm/version seems like a reasonable way to check the driver version (although maybe on FreeBSD we'd need a different check.)

Testing on cosmic-comp with https://github.com/pop-os/nvidia-graphics-drivers/pull/210 (using either open or proprietary kernel modules).

  • needs_sync() is now false, without causing corrupted rendering when an Intel GPU is used for rendering on an Nvidia output
  • I'm still seeing the Intel surface on cosmic-comp limited to 30fps on a 1650 mobile. So whatever issue is causing that (which also happens on Gnome Shell) is unrelated to this. Which I expected but hoped it might improve things.
  • But... this still seems buggy.

After running for a while with the Intel GPU rendering on the Nvidia output, that output freezes, and the driver logs to dmesg:

Jul 23 12:38:39 pop-os kernel: [drm:nv_drm_atomic_apply_modeset_config.isra.0 [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to initialize semaphore for plane fence
Jul 23 12:38:39 pop-os kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -11

Since the kernel modules are open source, we can at least see the code producing this error:

https://github.com/NVIDIA/open-gpu-kernel-modules/blob/448d5cc65624d3aa69015efa0d3fb50fd9729f41/kernel-open/nvidia-drm/nvidia-drm-modeset.c#L249-L258

That has a comment "This should only happen if the semaphore pool was somehow exhausted. Waiting a bit and retrying may help in that case."

I don't think Smithay/cosmic-comp is doing anything wrong to trigger this, so I guess it's a bug in the Nvidia driver? Hm...

ids1024 avatar Jul 23 '24 22:07 ids1024

On the hyprland side, we see similar display freezes on the 560 drivers with explicit sync enabled + IN_FENCE_FD. I've commented on the post on the Nvidia repo, lets hope to get some sort of reply from the nvidia team.

fxzzi avatar Aug 18 '24 14:08 fxzzi

It's mentioned on https://github.com/NVIDIA/open-gpu-kernel-modules/issues/622 that this was fixed. And indeed, I can't reproduce the bug after updating to 560.35.03 stable.

So if nothing else comes up, it looks like we can merge this now. I've updated the version requirement to the stable 560 driver.

ids1024 avatar Sep 12 '24 21:09 ids1024