open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

soc_isr_lock is missing a NV_SPIN_LOCK_INIT

Open akeshet opened this issue 1 year ago • 2 comments

NVIDIA Open GPU Kernel Modules Version

535.54.03

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • [X] I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Linux (custom distribution)

Kernel Release

5.15 customized in house

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • [X] I am running on a stable kernel release.

Hardware: GPU

Irrelevant

Describe the bug

In kernel-open/nvidia/nv.c we initialize the lock of an nv_linux_state_t. However, one of the locks (sock_isr_lock) is not initialized. This can be fixed with the following patch:

diff --git a/kernel-open/nvidia/nv.c b/kernel-open/nvidia/nv.c
index d81122d..3c7912d 100644
--- a/kernel-open/nvidia/nv.c
+++ b/kernel-open/nvidia/nv.c
@@ -3580,6 +3580,7 @@ NvBool nv_lock_init_locks
 
     NV_INIT_MUTEX(&nvl->ldata_lock);
     NV_INIT_MUTEX(&nvl->mmap_lock);
+    NV_SPIN_LOCK_INIT(&nvl->soc_isr_lock);
 
     NV_ATOMIC_SET(nvl->usage_count, 0);

This issue was discovered via the linux lockdep tool. It likely has no user facing consequences, but it does interfere with the lockdep tool's ability to run, causing it to fail and remove itself from the kernel at boot time if these drivers are inserted in the kernel.

To Reproduce

Boot a machine with lockdep instrumentation enabled, and observe dmesg log.

[   45.559931] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:8a:00.0 on minor 3
[   47.312278] INFO: trying to register non-static key.
[   47.312280] The code is fine but needs lockdep annotation, or maybe
[   47.312281] you didn't initialize this object before use?
[   47.312282] turning off the locking correctness validator.

Bug Incidence

Always

nvidia-bug-report.log.gz

N/A as this bug only affects kernel instrumentation.

More Info

No response

akeshet avatar May 17 '24 18:05 akeshet

Hey, good catch, thanks for the report! We do occasionally run with lockdep for dGPU, but I guess we don't on Tegra.

If you open a PR and sign the CLA, we can merge the patch in a way that will show you as a contributor. Otherwise, I can just silently fix it internally. (either way it'll be several months before it makes its way through our release/QA pipeline)

mtijanic avatar May 22 '24 15:05 mtijanic

It's a bit of an involved process for me to get CLA signing approval, so go ahead an fix it internally. Thanks!

akeshet avatar May 24 '24 23:05 akeshet