oneTBB
oneTBB copied to clipboard
Application using scalable allocator crashes on Windows 11 when opening a file dialog
A minimal program was created to reproduce the issue. The program opens a file dialog using GetOpenFileName.
On Windows 10, the program runs successfully both with and without the TBB scalable allocator.
On Windows 11, the program crashes when the scalable allocator is used. The crash occurs right after the file dialog displays. The application runs successfully with the scalable allocator removed.
The TBB allocator was enabled using the linker option:
tbbmalloc_proxy.lib /INCLUDE:"__TBB_malloc_proxy"
The code:
#include <windows.h>
#include <Commdlg.h>
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow)
{
OPENFILENAME ofn;
char szFile[500];
ZeroMemory(&ofn, sizeof(ofn));
ofn.lStructSize = sizeof(ofn);
ofn.hwndOwner = NULL;
ofn.lpstrFile = szFile;
ofn.lpstrFile[0] = '\0';
ofn.nMaxFile = sizeof(szFile);
ofn.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
ofn.nFilterIndex = 1;
ofn.lpstrFileTitle = NULL;
ofn.nMaxFileTitle = 0;
ofn.lpstrInitialDir = NULL;
ofn.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;
GetOpenFileName(&ofn);
MessageBox(NULL, ofn.lpstrFile, "File Name", MB_OK);
return 0;
}
@myoung301 can you provide more details :
- compiler (and version) being used
- call stack of the crash
- version of the build (debug /release )
@anton-potapov
compiler: Visual Studio 2017 (15.0.0) version: release
Call stack from windbg:
ntdll!RtlReportFatalFailure+0x9
ntdll!RtlReportCriticalFailure+0x97
ntdll!RtlpHeapHandleError+0x12
ntdll!RtlpHpHeapHandleError+0x7a
ntdll!RtlpLogHeapFailure+0x45
ntdll!RtlpFreeHeapInternal+0x825
ntdll!RtlFreeHeap+0x51
SHELL32!operator delete+0x28
SHELL32!CFSIconOverlayManager::MaybeAddIconOverlayHandlersFromStateRepo+0x185464
SHELL32!CFSIconOverlayManager::LoadIconOverlayIdentifiers+0xb7
SHELL32!CFSIconOverlayManager::_s_LoadIconOverlayIdentifiers+0x76
SHELL32!CFSIconOverlayManager::LoadNonloadedOverlayIdentifiers+0x13c
SHELL32!EnableExternalOverlayIdentifiers+0x4a
SHELL32!CFSIconOverlayManager::RefreshOverlayImages+0xc0
SHELL32!SHELL32_GetIconOverlayManager+0x98
windows_storage!CFSFolder::_GetOverlayInfo+0x124
explorerframe!CNscOverlayTask::_Extract+0x65
explorerframe!CNscOverlayTask::InternalResumeRT+0x55
explorerframe!CRunnableTask::Run+0xce
windows_storage!CShellTask::TT_Run+0x46
windows_storage!CShellTaskThread::ThreadProc+0xdd
windows_storage!CShellTaskThread::s_ThreadProc+0x44
shcore!ExecuteWorkItemThreadProc+0x20
ntdll!RtlpTpWorkCallback+0x171
ntdll!TppWorkerThread+0x686
KERNEL32!BaseThreadInitThunk+0x10
ntdll!RtlUserThreadStart+0x2b
@zheltovs could you please take a look?
@myoung301, we were not able to reproduce the issue at the moment. Does it shows all the times ? Or sporadically ? As well could you please specify the exact command line used to compile the sample ?
@myoung301, we are able to reproduce the issue with your steps and got same call stack. Analysis is in progress.
Hi, could you please share some more details regarding the reproduction? We received a crash dump from a customer showing the exact same call stack, but I failed to reproduce the issue so far.
What I tried: I compiled the example code above on Windows 10 (not 11) using Visual Studio 2019, Windows SDK 10.0.19041.0, C++14, Release, cl.exe 19.29.30137. The scalable allocator of oneTBB comes from revision 4eec89fe.
I then copied the binaries to the Windows 11 Professional machine which uses a clean install of Win11 (and installing all offered updates). I tried it with versions 10.0.22000.282 and 10.0.22000.348 of the shell32.dll and the Windows.FileExplorer.Common.dll, to no avail.
After digging around in shell32.dll, I guessed that the crash might have something to do with cloud sync tools such as owncloud creating an entry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\SyncRootManager
and syncing the user's Desktop. But unfortunately, I was still not able to reproduce the issue.
It seems some specific Windows setting or software is required to trigger the problem?
We have customer reports of crashes showing the same stack trace. However, we use mimalloc as a custom allocator instead of oneTBBs. So the issue does not seem specific to oneTBB, but is more a general windows issue.
However we also failed to reproduce the crashes on our dev machines or local VMs. I agree with @Sedeniono assessment that there must be some specific setting or software (probably containing explorer items) that triggers the crash.
I and a colleague both confirmed that if we sign out of OneDrive, the crash goes away. We have to be careful to fully sign out (the 'Unlink this PC' option in settings).
We've also reproduced the problem with mimalloc, as mentioned. So there is an issue when using either custom allocator and OneDrive on Windows 11.
@myoung301 @vlserov I can confirm that the example posted above crashes both with oneTBB and mimalloc if OneDrive is syncing the Desktop. By that I mean OneDrive's option to backup the Desktop. Note that if you enable this option, the Desktop folder itself gets moved under the OneDrive folder.
After some more debugging, I can confirm that one ingredient of the crash is indeed the registry path I mentioned above. More precisely, one requirement for the crash to occur is that the SID-key ("S-1-5-21-...") under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\SyncRootManager\OneDrive!......!101\UserSyncRoots
needs to contain a path that contains the Desktop. The path by default points to C:\Users\<User>\OneDrive
, and the Desktop got moved by OneDrive to C:\Users\<User>\OneDrive\Desktop
. If you modify the SID-key to e.g. C:\Users\<User>\OneDriveNotExisting
, the crash no longer occurs. If you "Unlink this PC" in the OneDrive settings, the entire registry key gets deleted.
In principle, the same could apply to any cloud sync tool (although I have, unexpectedly, not observed it yet with ownCloud).
I tested it with versions 10.0.22000.282 and 10.0.22000.348 of the shell32.dll and the Windows.FileExplorer.Common.dll files, and in both versions the FileOpen-Dialog crashes.
In short, I think it is a bug in shell32.dll, where memory is allocated via a redirected malloc but then free'd directly (bypassing mimalloc/oneTBB); maybe someone can confirm this. More details below.
One workaround is to ensure that the Desktop is not located under the OneDrive folder, or any other synced folder that is mentioned in the SyncRootManager
registry key. Besides this, a temporary solution is of course to disable the custom allocator, e.g. via the environment variable.
More details: All of the following analysis used version 10.0.22000.282 of the shell32.dll and the Windows.FileExplorer.Common.dll. There are some minor difference compared to 10.0.22000.348, but none that matter.
The following image shows the first relevant part of shell32.dll!CFSIconOverlayManager::MaybeAddIconOverlayHandlersFromStateRepo
:
Notice that it first gets the path to the Desktop (SHGetKnownFolderPath()
). Then the first virtual function call (the first __guard_xfg_dispatch_icall_fptr
) calls CSyncRootManager::GetStorageProviderInfoFromPath
which only returns something relevant if the Desktop is synced. This checks the registry key mentioned above.
If this is the case, the second virtual function call CStorageProviderInfo::GetIconOverlayHandlerClsidArray
is reached. This function in turn allocates memory if additional conditions are met. I am not entirely sure about them: The conditions are met in case of OneDrive, but not in case of ownCloud, and I failed to find the systematic differences between these two applications (both put stuff into the SyncRootManager
registry path and both register icon overlays). In any case, for OneDrive it continues to allocate memory via windows.fileexplorer.common.dll!operator new
:
which in turn calls
ucrtbase!malloc
, which is redirected (in this case by mimalloc):
Now, back in shell32.dll!CFSIconOverlayManager::MaybeAddIconOverlayHandlersFromStateRepo
, the code continues to do stuff with the allocated memory, and finally tries to delete it:
It calls
shell32.dll!operator delete[]
, which calls ntdll!FreeHeap
directly:
No references to mimalloc in the
FreeHeap
callstack as far as I can see. In the end, the ntdll!RtlLogHeapFailure
is called.
So, I think we have a Microsoft bug in shell32.dll: Memory is freed not via an ordinary (and thus redirected) free()
but rather directly via the Heap-functions, which seems incorrect to me. The memory allocation in windows.fileexplorer.common.dll, on the other hand, does use the ordinary malloc()
correctly. This mismatch probably causes the crash?
for MS folks, http://task.ms/38281009
@ChrisGuzak Am I interpreting your post correctly that the shell32.dll team at Microsoft is now aware of the issue? Otherwise, what does your link point to?
@ChrisGuzak Is this article meant to be access by non MS people? Or can we get access to it?
@ChrisGuzak what is the status on this?
So MSFT appears to have fixed this issue with KB5020044 (preview) or KB5021255 (stable).
The release notes for KB5020044 has this bullet point:
It addresses an issue that might cause certain apps to stop responding. This occurs when you use the Open File dialog.
Our preliminary testing does suggest the crash we experienced on Windows 11 with the Open File dialog box has gone away with this Windows update.
Digging into the Windows binaries (thanks @Sedeniono for his deep dive), previously (Windows 11 build 22000) CStorageProviderInfo::GetIconOverlayHandlerClsidArray allocates memory using operator new[](size_t, std::nothrow_t)
?GetIconOverlayHandlerClsidArray@CStorageProviderInfo@@UEAAJPEAIPEAPEAU_GUID@@PEAH@Z:
00000001800284E0: 48 89 5C 24 08 mov qword ptr [rsp+8],rbx
00000001800284E5: 48 89 6C 24 10 mov qword ptr [rsp+10h],rbp
00000001800284EA: 48 89 74 24 18 mov qword ptr [rsp+18h],rsi
...
0000000180028556: E8 D1 B2 FD FF call ??_U@YAPEAX_KAEBUnothrow_t@std@@@Z
...
00000001800285B2: 48 83 C4 20 add rsp,20h
00000001800285B6: 41 5F pop r15
00000001800285B8: 41 5E pop r14
00000001800285BA: 5F pop rdi
00000001800285BB: C3 ret
which is then freed with HeapFree in shell32.dll via operator delete. As noted earlier, this is the reason for the crash we were seeing.
In Windows 11 build 22621, CStorageProviderInfo::GetIconOverlayHandlerClsidArray now calls a helper function CopyClsidsToCoTaskMemAlloc
?GetIconOverlayHandlerClsidArray@CStorageProviderInfo@@UEAAJPEAIPEAPEAU_GUID@@PEAH@Z:
00000001800496E0: 41 83 21 00 and dword ptr [r9],0
00000001800496E4: 48 81 C1 48 02 00 add rcx,248h
00
00000001800496EB: E9 08 D7 FF FF jmp ?CopyClsidsToCoTaskMemAlloc@@YAJAEBV?$vector@U_GUID@@V?$allocator@U_GUID@@@std@@@std@@PEAIPEAPEAU_GUID@@@Z
which in turn calls CoTaskMemAlloc to allocate memory.
?CopyClsidsToCoTaskMemAlloc@@YAJAEBV?$vector@U_GUID@@V?$allocator@U_GUID@@@std@@@std@@PEAIPEAPEAU_GUID@@@Z:
0000000180046DF8: 48 8B C4 mov rax,rsp
0000000180046DFB: 48 89 58 08 mov qword ptr [rax+8],rbx
...
0000000180046E62: 48 FF 15 A7 10 03 call qword ptr [__imp_CoTaskMemAlloc]
00
...
0000000180046ED6: 48 83 C4 20 add rsp,20h
0000000180046EDA: 41 5F pop r15
0000000180046EDC: C3 ret
Instead of operator delete/HeapFree, this memory is then freed in shell32.dll by
wil::unique_any_array_ptr<_GUID, wil::function_deleter<void (*)(void *),CoTaskMemFree>, wil::empty_deleter>::reset()
The use of CoTaskMem* effectively work arounds the malloc/HeapFree debacle.
Regarding the use of HeapFree to free a malloc allocated pointer as seen in Windows 11 build 22000, I tend to believe this is not actually a MSFT bug.
Instead, I think link-time optimisation is to blame here. My hunch is that shell32.dll is built as part of the same project as ntdll.dll, which is how the body of malloc (which just calls RtlAllocateHeap with GetProcessHeap) got inlined into shell32.dll's operator new.