Detours icon indicating copy to clipboard operation
Detours copied to clipboard

Fix if the hook is not called at the start of the process, the hook may fail

Open sonyps5201314 opened this issue 3 years ago • 9 comments

for example, create a c++ console project, create many threads to call CreateFileW,ReadFile,CloseHandle and create serveral threads to execute Hook and Unhook CreateFileW and ReadFile function every 100 milliseconds. it will be failed after some time.

sonyps5201314 avatar Sep 04 '20 08:09 sonyps5201314

@bgianfo, I have rebased to only include this commit.

sonyps5201314 avatar Sep 05 '20 05:09 sonyps5201314

Hey @sonyps5201314 thanks again for the contribution.

@dtarditi and I looked at this togeather today, and we decided the change as it is, is very risk to merge since it's so large and invasive. Can you please desribe what the actual race condition / bug this is fixing is so we can work on trying to scope down this change? It's not clear from the description or the commits where the bug you are fixing actually.

Is it also possible to introduce a stress test / unit test to validate we have fix the issue and don't regress it in the future?

bgianfo avatar Mar 04 '21 23:03 bgianfo

Is it also possible to introduce a stress test / unit test to validate we have fix the issue and don't regress it in the future?

@bgianfo TestDetoursBug144.zip is a BUG reproduction program. Compile, debug and run the DebugMaster configuration, the program will crash immediately, because it uses the official unfixed master branch code. image

Compile, debug and run the DebugMyFixedBranch configuration, the program will run normally, because it uses the fixed fix_maybe_can_not_hook_functions_not_at_the_starttime_of_a_process code in my PR.

Note: It is recommended to use VS2013 to compile and debug the program of this project, because VS2019 and VS2017 have the bug that the call stack of the program cannot be viewed when the x86 program crashes or deadlocks. It has been reported for almost a year, and the official hasn't repaired it yet. vs2017 and vs2019 can`t show full call stack when a deadlock occurs in x86 c++.

sonyps5201314 avatar Mar 06 '21 14:03 sonyps5201314

@bgianfo Can you please desribe what the actual race condition / bug this is fixing

The original commit seems to mostly be related to #70 (though covers a broader problem): any new/delete/malloc/free etc after any DetourUpdateThread() is risky:

In windows, there is a per-heap lock, which new/delete/malloc/free always need to acquire, and in most applications, practically all operations will be on the same default heap - in effect, roughly speaking, there's a per-process lock on heap operations. See https://docs.microsoft.com/en-us/windows/win32/memory/heap-functions

DetourUpdateThread() ultimately calls SuspendThread(); after this point, if the thread you've passed in happens to already have the heap lock, nothing else can acquire the lock, and all new/delete/malloc/free will deadlock. This includes:

  • the explicit new operations in DetourUpdateThread() - e.g. if you're updating two threads, and the first one has the heap lock, the second DetourUpdateThread() will deadlock
  • the delete operations in DetourTransactionCommit
  • the malloc inside the printfs inside DETOUR_TRACE

You can reproduce this by:

  • creating 3 threads
  • in thread 1, call HeapLock(GetProcessHeap())
  • just loop in thread 2
  • in thread 3, call DetourTransactionBegin(); DetourUpdateThread(thread_1_handle); DetourUpdateThread(thread_2_handle);

--

There are also several other bugs fixed in later commits; I'm not came across them in practice, so I'm not able to explain them.

--

As for how this relates to 'at the start of the process':

  • updating all threads is a relatively common (and necessary) pattern, which significantly raises the risk of hitting this condition. This PR adds a DetourUpdateAllOtherThreads() function to make this pattern easier, but that's tangental to the bug fix.

  • if you attach on process startup, there are many more mallocs/news happening than later in the process lifecycle: as well as the program code, there's all the implicit ones behind LoadLibrary() etc. Again, this significantly raises the risk

fredemmott avatar Jan 16 '22 17:01 fredemmott

Here is a small standalone example: https://gist.github.com/fredemmott/adb9c41d7cff44aecf92bece990f4c3f

The explicit lock is just to make it a reliable repro; you can reproduce it less artificially (and less reliably) by replacing the locking threadproc with:

while (true) {
  delete new int;
}

You can replace 'int' with any type.

fredemmott avatar Jan 16 '22 17:01 fredemmott

An alternative fix for that first issue would be to simply call HeapLock(GetProcessHeap()) in DetourTransactionBegin(), and HeapUnlock(GetProcessHeap()) in DetourTransactionCommit() + DetourTransactionAbort()

fredemmott avatar Jan 16 '22 18:01 fredemmott

The latest committed code makes Detours use VirtualAlloc/VirtualFree for memory allocation and deallocation, so there is no Heap lock problem you worry about.

// After calling DetourUpdateAllOtherThreads, you should call DetourTransactionCommit(Ex) or DetourTransactionAbort as soon as possible.
// In addition to the DetourAttach(Ex)\DetourDetach(Ex) call, other user operations should not be included between them,
// because other user operations may cause CRT lock competition, and at this time CRT lock may be owned by other threads.
// Other threads have been suspended at this time, so that may cause deadlock problems
LONG WINAPI DetourUpdateThreadEx(_In_ HANDLE hThread, _In_opt_ BOOL fCloseThreadHandleOnDestroyDetourThreadObject /*= TRUE*/);
BOOL WINAPI DetourUpdateAllOtherThreads();

and you can read the commented text above for DetourUpdateThreadEx and DetourUpdateAllOtherThreads to understand the precautions for using them. and from the above comments and the code from VS2008:

int __cdecl _heap_init (
        int mtflag
        )
{
#if defined _M_AMD64 || defined _M_IA64
        // HEAP_NO_SERIALIZE is incompatible with the LFH heap
        mtflag = 1;
#endif  /* defined _M_AMD64 || defined _M_IA64 */
        //  Initialize the "big-block" heap first.
        if ( (_crtheap = HeapCreate( mtflag ? 0 : HEAP_NO_SERIALIZE,
                                     BYTES_PER_PAGE, 0 )) == NULL )
            return 0;

you can know that HeapLock(GetProcessHeap()) ... is not safe enough. You might say use HeapLock((HANDLE)_get_heap_handle()); instead But if a thread already owns the lock and has been suspended by other threads of the user, you call HeapLock((HANDLE)_get_heap_handle()); it will only make the current thread fall into an infinite wait or deadlock, so The best way is to avoid using the HeapAlloc family of functions for memory allocation. and you must know that the purpose of DetourUpdateAllOtherThreads is to solve the problem that the HOOKed instruction is being run by other threads, not the Heap lock problem. Heap and CRT lock problems are only one problem that may be caused by DetourUpdateAllOtherThreads. Therefore, I introduced DetourMemAlloc, DetourMemReAlloc, DetourMemFree to avoid this problem. The code in the PR has also been used in many programs of my own and my company and has been running stably in many computers for a long time, so you can believe that they are reliable of.

sonyps5201314 avatar Jan 23 '22 13:01 sonyps5201314

But if a thread already owns the lock and has been suspended by other threads of the user, you call HeapLock((HANDLE)_get_heap_handle()); it will only make the current thread fall into an infinite wait or deadlock

This can generally be avoided by calling HeapLock() before suspending any threads.If something else in the process has already locked a thread which holds the lock, either that thread will eventually be resumed (in which case it's not an infinite wait), or the app will be hanging forever anyway. Either way, even if we can install the hook while something else has the heap lock, my detours (and the application code that calls them) still use the heap, so it isn't a practical concern, as long as my thread acquires a heap lock before suspending any threads.

so The best way is to avoid using the HeapAlloc family of functions for memory allocation.

I don't disagree on this, but it is not necessary for many cases.

The code in the PR has also been used in many programs of my own and my company and has been running stably in many computers for a long time, so you can believe that they are reliable of.

I'm looking for minimal - and understandable - changes on top of master. This PR fixes several issues (and I understand this is more than the heaplock race), and adds new features. Regardless of reliability for you, that makes it not fit my goals. Splitting it up with clear explanations would also likely aid Microsoft in reviewing and ideally merging.

That said, I want to be clear that I appreciate your insight and efforts for the community :)

fredemmott avatar Jan 23 '22 16:01 fredemmott

DWORD WINAPI ThreadProc_UserCode(LPVOID lpThreadParameter)
{
	Sleep(1000);
	HeapLock((HANDLE)_get_heap_handle());
	Sleep(10000000);
	HeapUnlock((HANDLE)_get_heap_handle());
	return 0;
}

DWORD WINAPI ThreadProc_TestHook(LPVOID lpThreadParameter)
{
	Sleep(2000);
	while (1)
	{
		OutputDebugStringA("ThreadProc_TestHook prepare for hook.\n");
		//replace "OutputDebugStringA" to "printf" function will trigger another deadlock

		HeapLock((HANDLE)_get_heap_handle());

		OutputDebugStringA("ThreadProc_TestHook start to hook.\n");
		DetourTransactionBegin();

		DetourUpdateThread(GetCurrentThread());

		DetourTransactionCommit();

		HeapUnlock((HANDLE)_get_heap_handle());
	}
	return 0;
}
int main()
{
	HANDLE hThread = CreateThread(NULL, 0, ThreadProc_UserCode, NULL, 0, NULL);
	if (hThread)
	{
		CloseHandle(hThread);
		hThread = NULL;
	}
	Sleep(100);

	hThread = CreateThread(NULL, 0, ThreadProc_TestHook, NULL, 0, NULL);
	if (hThread)
	{
		CloseHandle(hThread);
		hThread = NULL;
	}
	Sleep(100 * 1000);
}

The above is a simple example that doesn't do what you expect. In addition to our own controllable code, a program may also contain system library code, and code injected by third-party DLLs. The code in ThreadProc_UserCode may appear in third-party DLLs and system libraries. For example, in system library AcLayers.dll, it hooks the Heap series functions and has its own lock implementation for the Heap functions, so you can't think that the call to HeapLock returned, you won't Stuck in the next calls to functions such as HeapAlloc.

I don't disagree on this, but it is not necessary for many cases.

Detours, as a general purpose library program, must be able to accommodate all situations, not just most of what you said.

I'm looking for minimal - and understandable - changes on top of master.

image The latest commit has been merged with the latest official master branch, so your worries are gone.

This PR fixes several issues

No, all its modifications are for one purpose, which is the title of this PR. Please read the code carefully before expressing your opinion. As I said before, those Heap and CRT lock problems are just new problems that may arise after the introduction of DetourUpdateAllOtherThreads, so I improved the memory allocation implementation inside Detours to solve this new problem.

sonyps5201314 avatar Jan 24 '22 06:01 sonyps5201314