ZipUtility-Unreal icon indicating copy to clipboard operation
ZipUtility-Unreal copied to clipboard

Possible race condition in WFULambdaRunnable

Open JohnKovalsky opened this issue 7 years ago • 3 comments

In file WFULambdaRunnable.cpp from commit 5208dd878aaf51ba2520b26c1f2124bf6402f699 in line 16 takes place creation of thread.

Thread = FRunnableThread::Create(this, *threadStatGroup, 0, TPri_BelowNormal); //windows default = 8mb for thread, could specify more

Freshly created thread could end before execution of this line is done. In such a case:

  1. Finished thread call virtual method ~~Stop~~ Exit from line 56.
  2. ~~Stop~~ Exit destroys heap allocated this object.
  3. Then comes execution of line ~~17~~ 16, which is accessing member Thread of just destroyed this object, that could lead to memory corruption.

Possible solution is to use FCriticalSection to ensure atomic execution of WFULambdaRunnable constructor as well as ~~Stop~~ Exit method.

JohnKovalsky avatar Jun 05 '17 23:06 JohnKovalsky

Interesting. So WFULambdaRunnable has a lot of pointless lines left over from early UE4 exploration but it is generally only used as a simple throwaway thread for long lambdas. Cleanup suggestions for the class are very welcome, ideally just getting rid of all the extra cruft that doesn't do anything useful would do wonders.

That said I don't think the race condition you propose should happen. Line 17 increments a class scoped static uint64, not a member variable (that should really be changed to a thread safe counter or two threads could potentially share the same name).

Also looking through the epic codebase for Windows thread creation (WindowsRunnableThread.h#L186 and WindowsRunnableThread.cpp#L68) it appears they use a sync event to ensure init gets called first and then only a thread kill signal (WindowsRunnableThread.h#L149) can cause a FRunnable::Stop call. Even if a stop call does happen, it only changes the FThreadSafeBool Finished state which is a member of WFULambdaRunnable not FRunnableThread. The WFULambdaRunnable member variables only get otherwise affected by the ~WFULambdaRunnable() destructor which gets exclusively called on Exit(). Looking at it now, it does seem like a leak is possible if the thread gets a Stop() call, since Exit() it doesn't get called in that instance (yay assumptions).

This leaves the ~WFULambdaRunnable() destructor which thanks to WFULambdaRunnable.cpp#L24 line stops the destructor from doing anything useful at all (facepalm).

Anyway despite all that I agree that init/destruct should be FCriticalSection scoped. Perhaps the better question is though, do we need anything apart from creating the lambda thread and cleanup?

getnamo avatar Jun 06 '17 06:06 getnamo

First of all: I made lot of mistakes in issue description. I'm really sorry, I did it late at night. Now i'v just edited the content so it could be more precise. Most important thing I was was wrong about is Stop method name: it should be Exit!

That said I don't think the race condition you propose should happen. Line 17 increments a class scoped static uint64, not a member variable (that should really be changed to a thread safe counter or two threads could potentially share the same name).

You've right, I made mistake. It should be line 16, which is assigment of non-static member Thread. In case I described above this could write memory that have been previously deallocated in Exit method (as thread was fast enought to finish before execution of line 16 is done).

... it appears they use a sync event to ensure init gets called first and then only a thread kill signal (WindowsRunnableThread.h#L149) can cause a FRunnable::Stop call

That seems to be true, but mechanism you've described ensure only that Init method is called before factory method finishes. It's not guarante that Run method will finish after thread creation is clompleted (including possible following assigments).

Anyway despite all that I agree that init/destruct should be FCriticalSection scoped. Perhaps the better question is though, do we need anything apart from creating the lambda thread and cleanup?

That's good question. If we could answer it positively maybe there is no need to store Thread variable. I such case it will be enought to use specialized factory method of FRunnableThread, which automatically destroys thread and associated object after it finishes.

JohnKovalsky avatar Jun 06 '17 21:06 JohnKovalsky

I was thinking about what wanted out of a Lambda thread and why I use the architecture rather than FRunnables and it really boils down to context capture and inline coding. Using a specialized FRunnableThread::Create could make sense, but I feel like the custom wrapper class allows for a slightly higher level approach, which means less code for common use cases.

Thinking about use cases, having something with basic options and good defaults is probably the best way forward.

FLambdaRunnable* FLambdaRunnable::RunOnBackThread(TFunction< void()> RunCallback,  TFunction< void()> EarlyKillCallback = nullptr, EThreadPriority InThreadPri = TPri_Normal);

something like that is what I'm leaning towards. Maybe an overload or a second convenience function with bShouldFinish passed in e.g.

FLambdaRunnable* FLambdaRunnable::RunStoppableOnBackThread(TFunction< void(FThreadSafeBool& bShouldFinish)> RunCallback,  TFunction< void()> EarlyKillCallback = nullptr, EThreadPriority InThreadPri = TPri_Normal);

the boolean could be polled in the thread for graceful stop requests (useful if using run loops), we would feed in a reference to a threadsafe boolean member variable to make it work. Advantage of having it fed in vs context captured is that this would be self contained inside the FLambdaRunnable, so the user could do Stop() and the boolean would switch. This saves us of needing to track two things out of the initial call scope vs just one, the FLambdaRunnable*.

First lambda gives you an option to handle early kill callbacks and do a quick cleanup in that event, but if your lambda doesn't need it you can ignore it, while the second variance also supports graceful loop exits.

e.g. use cases

FLambdaRunnable::RunOnBackThread([]
{
    //simple lambda all data allocs may be stack allocated or handled elsewhere, no cleanup reqs
});

and

FLambdaRunnable::RunOnBackThread([this]
{
    //referencing scoped vars maybe in a long running op/loop
},
{
    //got an early kill, let's cleanup as our op may have forcibly been killed
});

A pointer to thread should still be needed internally so we can request thread kills in cases where we need to quit immediately. FLambdaRunnable should then have both a clean Stop() method and a Kill(bool bShouldAwaitCompletion) method, later of which would issue thread->kill.

Game thread lambdas would still use the taskgraph system since the gamethread should never stall (and task graph functions cannot take longer than 2-3 seconds or they stall the engine so this is appropriate). These are mainly used to communicate back from background threads. No change in behavior for that function, maybe just a rename.

FGraphEventRef FLambdaRunnable::RunOnGameThread(TFunction< void()> RunCallback);

Thoughts?

getnamo avatar Jun 07 '17 17:06 getnamo