potential bugs regarding to RMAPI_GPU_LOCK_INTERNAL usage in _createOrReuseVidmemInfoPersistent
NVIDIA Open GPU Kernel Modules Version
565.57.01-p2p
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [x] I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Description: Ubuntu 22.04.5 LTS
Kernel Release
Linux jmkernel 5.15.0-126-generic #136-Ubuntu SMP Wed Nov 6 10:38:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
Hardware: GPU
GPU 0: NVIDIA A40 (UUID: GPU-f1654204-ae9d-31d9-da35-2e59c60cd8e4)
Describe the bug
=> rm_p2p_get_pages_persistent // call rmapiLockAcquire() to acquire API lock. ==> RmP2PGetPagesPersistent ===> > _createOrReuseVidmemInfoPersistent
at the beginning of _createOrReuseVidmemInfoPersistent(), there are codes:
RM_API *pRmApi = rmapiGetInterface(RMAPI_GPU_LOCK_INTERNAL);
and
RMAPI_GPU_LOCK_INTERNAL, // For clients that have TLS, API lock, and GPU lock -- security is RM internal
IIUC, once RMAPI_GPU_LOCK_INTERNAL is used, API lock and GPU lock will be considered to have been acquired,but look at codes, it seems that before calling _createOrReuseVidmemInfoPersistent, only API lock have been acquired, but GPU lock is not acquired.
so I wonder whether it should be modified to RM_API *pRmApi = rmapiGetInterface(RMAPI_API_LOCK_INTERNAL);
Thanks for your time.
To Reproduce
It maybe not a bug, currently found this issue by reading codes.
Bug Incidence
Once
nvidia-bug-report.log.gz
It maybe not a bug, currently found this issue by reading codes.
More Info
No response
Hey there. Thanks, this is definitely looking like a bug, although I'm not sure the suggested fix is correct; more likely the API needs to take the GPU lock too.
We have some tooling to detect these but it was generating plenty of false positives so we never enabled it. Maybe it's time to resurrect it.