soc_isr_lock is missing a NV_SPIN_LOCK_INIT
NVIDIA Open GPU Kernel Modules Version
535.54.03
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [X] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Linux (custom distribution)
Kernel Release
5.15 customized in house
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [X] I am running on a stable kernel release.
Hardware: GPU
Irrelevant
Describe the bug
In kernel-open/nvidia/nv.c we initialize the lock of an nv_linux_state_t. However, one of the locks (sock_isr_lock) is not initialized. This can be fixed with the following patch:
diff --git a/kernel-open/nvidia/nv.c b/kernel-open/nvidia/nv.c
index d81122d..3c7912d 100644
--- a/kernel-open/nvidia/nv.c
+++ b/kernel-open/nvidia/nv.c
@@ -3580,6 +3580,7 @@ NvBool nv_lock_init_locks
NV_INIT_MUTEX(&nvl->ldata_lock);
NV_INIT_MUTEX(&nvl->mmap_lock);
+ NV_SPIN_LOCK_INIT(&nvl->soc_isr_lock);
NV_ATOMIC_SET(nvl->usage_count, 0);
This issue was discovered via the linux lockdep tool. It likely has no user facing consequences, but it does interfere with the lockdep tool's ability to run, causing it to fail and remove itself from the kernel at boot time if these drivers are inserted in the kernel.
To Reproduce
Boot a machine with lockdep instrumentation enabled, and observe dmesg log.
[ 45.559931] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:8a:00.0 on minor 3
[ 47.312278] INFO: trying to register non-static key.
[ 47.312280] The code is fine but needs lockdep annotation, or maybe
[ 47.312281] you didn't initialize this object before use?
[ 47.312282] turning off the locking correctness validator.
Bug Incidence
Always
nvidia-bug-report.log.gz
N/A as this bug only affects kernel instrumentation.
More Info
No response
Hey, good catch, thanks for the report! We do occasionally run with lockdep for dGPU, but I guess we don't on Tegra.
If you open a PR and sign the CLA, we can merge the patch in a way that will show you as a contributor. Otherwise, I can just silently fix it internally. (either way it'll be several months before it makes its way through our release/QA pipeline)
It's a bit of an involved process for me to get CLA signing approval, so go ahead an fix it internally. Thanks!