Atomic commit fails if IN_FENCE_FD is set
NVIDIA Open GPU Kernel Modules Version
550.40.59
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [ ] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
n/a
Kernel Release
n/a
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [ ] I am running on a stable kernel release.
Hardware: GPU
n/a
Describe the bug
When an atomic commit contains an IN_FENCE_FD, the nvidia driver rejects the commit with EPERM:
if (plane_state->fence != NULL || nv_drm_plane_state->fd_user_ptr) {
if (!nv_dev->supportsSyncpts) {
return -1;
}
-1 is EPERM.
First of all this is not a correct error code for this situation.
But more importantly, this makes it impossible to synchronize presentation with rendering from the compositor side. The workaround is to just not use IN_FENCE_FD if the nvidia driver is detected. Kinda absurd since nvidia was the vendor pushing for explicit sync in the first place.
If that is the expected workaround, then maybe the driver should just accept such commits instead of returning an error.
For whatever reason, one of my users reports that the commits are nevertheless applied by the nvidia driver even though the atomic commit fails. This might be a separate bug.
To Reproduce
n/a
Bug Incidence
Always
nvidia-bug-report.log.gz
n/a
More Info
I don't have nvidia hardware so I cannot fill out the remaining fields.
@cubanismo is working on adding support for IN_FENCE_FD and OUT_FENCE_PTR to our nvidia-drm module, so these should be working properly in near-future driver release. By "properly" I mean leveraging the synchronization capabilities of the GPU's display hardware.
Having said that, I checked with him and we agreed that our current behavior of failing when IN_FENCE_FD is specified is wrong (and, yeah, returning -EPERM is extra wrong). Instead, we should be calling drm_atomic_helper_wait_for_fences to do a CPU wait on the fence. This is what the nouveau driver does, for example.
As you pointed out, there isn't really a good way for userspace to determine whether IN_FENCE_FD is supported, implying the intention was probably for it to always be supported. At worst using a CPU wait as described above.
If we had noticed this problem earlier we would most likely have made such a change as a short-term fix. However, at this point, since a better fix is right around the corner, there may not be any benefit to doing that.
Thanks for the update.
PS: The issue I mentioned re "commit being applied anyway" turned out to be a bug on my side where I would render directly to the front buffer if too many commits failed in a row.
It's nice to see this added in the 560.28.03 beta. Though testing this with cosmic-comp/smithay (https://github.com/Smithay/smithay/pull/1489) on a Intel+NVIDIA graphics laptop it seems to be running into an issue and logging "Failed to initialize semaphore for plane fence" to dmesg. I don't think the compositor is doing anything wrong here.
I can confirm that whilst this was added in the 560 drivers, it does not seem to be implemented fully correctly as ids1024 mentioned. I seem to be getting constant freezes and display lockups which is caused by an issue with the implementation.
@ids1024, @Fxzzi, were you using more than one display? We identified a bug in the implementation in some situations on Wayland desktops with multiple displays. It should be resolved in subsequent 560-series drivers.
Here is the relevant forum post for the fix refered to by @cubanismo.
4769163 - Screen freeze observed with directly scanned out windows on a multi-monitor configuration
https://forums.developer.nvidia.com/t/560-release-feedback-discussion/300830/210
@ids1024, @Fxzzi, were you using more than one display? We identified a bug in the implementation in some situations on Wayland desktops with multiple displays. It should be resolved in subsequent 560-series drivers.
I am personally using 2 displays. Hopefully the issue you've resolved is the same as what I was experiencing.
We released driver 560.35.03 yesterday, which contains the fix we are referring to for multiple displays.
https://www.nvidia.com/en-us/drivers/details/230918/
Fix released as @Binary-Eater mentions above.
https://forums.developer.nvidia.com/t/560-release-feedback-discussion/300830/309 Internal bug 4769163