open-gpu-kernel-modules icon indicating copy to clipboard operation
open-gpu-kernel-modules copied to clipboard

Nv04Unmap: ummap failed

Open ryao opened this issue 3 years ago • 7 comments

NVIDIA Driver Version 515.43.04

GPU RTX 3080

Describe the bug I see the following in dmesg:

[   39.219138] NVRM rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[   39.219922] NVRM serverFreeResourceTree: hObject 0xbeee0100 not found for client 0xc1d0003c
[   39.225971] NVRM rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[   39.226877] NVRM serverFreeResourceTree: hObject 0xbeef0100 not found for client 0xc1d0003c
[   39.892724] NVRM rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[   39.893505] NVRM serverFreeResourceTree: hObject 0xbeee0100 not found for client 0xc1d0003d
[   39.899565] NVRM rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[   39.900542] NVRM serverFreeResourceTree: hObject 0xbeef0100 not found for client 0xc1d0003d

More appear when I run nvidia-bug-report.sh. My display locked up on my first attempt to report this.

To Reproduce I assume this is IOMMU related.

Expected behavior These errors should not appear in dmesg.

Please reproduce the problem, run nvidia-bug-report.sh, and attach the resulting nvidia-bug-report.log.gz. nvidia-bug-report.log.gz

ryao avatar May 13 '22 03:05 ryao

If it matters, I am using KDE 5.24.5.

ryao avatar May 13 '22 03:05 ryao

I spotted this in dmesg following additional unmap failures:

[  220.153967] NVRM rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[  220.154960] NVRM serverFreeResourceTree: hObject 0xbeee0100 not found for client 0xc1d000f8
[  220.162233] NVRM rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[  220.163073] NVRM serverFreeResourceTree: hObject 0xbeef0100 not found for client 0xc1d000f8
[  220.163626] fossilize_repla[18186]: segfault at ad0 ip 00007f54810c8761 sp 00007f544e5fe320 error 4 in libnvidia-glcore.so.515.43.04[7f548048b000+1a6d000]
[  220.163633] Code: 54 49 89 d5 55 53 48 89 fb 48 89 d7 48 83 ec 18 48 89 74 24 08 ff 53 20 48 89 df 89 c6 89 c5 e8 d5 fd ff ff 41 89 c6 48 8b 03 <4e> 8b 3c f0 4d 85 ff 74 25 45 31 e4 0f 1f 00 41 39 6f 18 75 0e 49
[  242.652728] fossilize_repla[18946]: segfault at c10 ip 0000555aa8b78dae sp 00007f544e5fe760 error 4 in fossilize_replay[555aa8b4d000+1b6000]
[  242.652738] Code: 00 45 85 c0 0f 85 7b ff ff ff f3 c3 f3 c3 f3 c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 38 <4c> 8b 27 41 0f b6 ac 24 09 0d 00 00 40 84 ed 74 11 48 83 c4 38 89

It might not be nvidia-bug-report.sh that caused the additional errors, although I am not sure why fossilize-replay would be interacting with the kernel driver. Honestly, my gut feeling is that I have no clue what caused the errors at this point. :/

ryao avatar May 13 '22 03:05 ryao

I noticed that I was not booting with IOMMU support turned on, so I turned it on and booted again. This time, I did not open chromium until after nvidia-bug-report.sh had run and far fewer entries appeared in dmesg.

nvidia-bug-report.log.gz

Interestingly, I am not seeing many messages in dmesg, even after starting chromium, now that the IOMMU is properly configured.

ryao avatar May 13 '22 03:05 ryao

Thank you for the report. We are tracking it internally as bug 3624003. The log spam should not impede normal driver operation, as the resources in question still get freed. But we are investigating the root cause anyway.

mtijanic avatar May 13 '22 09:05 mtijanic

Is this still being looked into? I've switched to open modules with driver 560, and I'm seeing lots of such messages while gaming (DXVK) and streaming (OBS). It looks like it also hurts performance, maybe due to VRAM exhaustion?

This usually comes with other messages:

[96196.335433] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96197.285734] NVRM: intermapRegisterDmaMapping: Failed to insert new mapping node for range 0xDF107A0000-0xDF107EFFFF!
[96197.285754] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96197.286862] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96197.286875] NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:4798
[96197.286878] NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:2274
[96197.331062] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96197.488877] NVRM: intermapRegisterDmaMapping: Failed to insert new mapping node for range 0xDF107A0000-0xDF107EFFFF!
[96197.488892] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96197.489849] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96197.489855] NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:4798
[96197.489858] NVRM: nvAssertFailedNoLog: Assertion failed: NV_OK == status @ gpu_vaspace.c:2274
[96197.496874] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96197.508124] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96198.294541] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96198.350246] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96199.290161] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96199.350453] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96200.288239] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96200.342010] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96201.302619] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)
[96201.350633] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)

and these:

# dmesg | grep btree
[96193.265031] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96194.264567] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96195.284501] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96196.289076] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96197.285754] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)
[96197.488892] NVRM: rmapiMapWithSecInfo: Nv04Map: map failed; status: Found a duplicate entry in the requested btree [NV_ERR_INSERT_DUPLICATE_NAME] (0x00000019)

After these messages appeared, desktop performance is stuttery, especially when opening new windows.

I wonder if there it's also causing OBS to longer being able to record with NVENC but that may be a completely different issue (fails with unknown cuda error).

I'm still using X11 because wayland is not stable enough: windows or the game turn blacks as soon as VRAM is filled almost completely.

nvidia-bug-report.log.gz

kakra avatar Jul 27 '24 02:07 kakra

Ran intonthis on NixOS, but only get the NV_ERR_OBJECT_NOT_FOUND spam. Happened while playing a game through DXVK+gamescope and then received a notification through dunst on a PRIME Reverse Offload connected display. Machine is a 3060 laptop with a Intel i7-12700something.

MagicRB avatar Sep 20 '24 20:09 MagicRB

Same here while running Star Citizen, I'm pretty sure this is from VRAM exhaustion.

Got a log full worth of these:

''' [35600.137991] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Request ed object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) [35658.960645] NVRM: rmapiUnmapWithSecInfo: Nv04Unmap: ummap failed; status: Request ed object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) '''

Driver version is 565.77.

While the card chills at 9.8Gb used, I can't do anything else but the game I'm playing. Opening a new tab or even right clicking anything will probably make that application crash or halt until there is enough memory freed.

For example starting a YouTube video only works while the window is minimized, when it is visible it is just undefined.

I'm on Wayland, which has major issues with screen recording... especially with SDL applications throttling when not active and I think that is also the reason why loading videos at that point is possible because the window is throttled to effectively zero frames.

sfjuocekr avatar Dec 27 '24 21:12 sfjuocekr