open-gpu-kernel-modules
open-gpu-kernel-modules copied to clipboard
Fix kernel memory leak in pNotifShare
380 if (pNotifierClient != NULL) 381 { 382 ¦ RsResourceRef *pNotifierRef; 383 ¦ INotifier *pNotifier; 384 ¦ if (clientGetResourceRef(pNotifierClient, hNotifierResource, &pNotifierRef) != NV_OK) 385 ¦ ¦ return NV_ERR_INVALID_OBJECT; 386 387 ¦ pNotifier = dynamicCast(pNotifierRef->pResource, INotifier); 388 ¦ if (pNotifier == NULL) 389 ¦ ¦ return NV_ERR_INVALID_OBJECT; 390 391 ¦ rmStatus = inotifyGetOrAllocNotifShare(pNotifier, hNotifierClient, hNotifierResource, &pNotifierShare); 392 ¦ if (rmStatus != NV_OK) 393 ¦ ¦ return rmStatus; 394 395 ¦ *pppEventNotification = inotifyGetNotificationListPtr(pNotifierShare->pNotifier); 396 } 397 398 serverRefShare(&g_resServ, staticCast(pNotifierShare, RsShared)); <------- 399 pEvent->pNotifierShare = pNotifierShare;
line 391 inotifyGetOrAllocNotifShare() initializes pNotifierShare and sets pNotifierShare->refCount to 1. line 398 serverRefShare() accidentally increments pNotifierShare0->refCount by 1, and it's 2 now. So pNotifier->refCount would never reach zero and leak forever. This issue can be fixed by my commit.
@Cwndmiao Just curious from end-user perspective. How severe or often does this leak occur? Trying to figure the severity to see if we want to apply this patch internally pre merge/review. Thanks.
@Cwndmiao Just curious from end-user perspective. How severe or often does this leak occur? Trying to figure the severity to see if we want to apply this patch internally pre merge/review. Thanks.
AFAIK, one NotifierShare binds one/many Event/OsEvent to one Subdevice, so that Event gets notified when something is triggered on one Subdevice/GPU, known as the Observer Design Pattern. To conclude, one NotifierShare leaks once Subdevice gets constructed/destructed, or more concisely, one app linked to libcuda.so spawns/exits.
And this can be detected by the SMART NVPORT memory allocator.