Add dedicated memory cleanup thread to fix memory leak
Hi,
This PR is meant to re-start the discussion around issue 376.
Problem description: Dr.Jit leaks memory for code where the main thread never gets a chance to execute Py_AddPendingCall enqueued cleanup routines. A typical example of this would be to run an optimization on a thread that is not the main thread, and in each iteration convert some NumPy array to a Dr.Jit tensor. While this seems unusual, this can easily happen when running Mitsuba & Dr.Jit code in complex pipelines (e.g., on a cluster) using tools such as Apache Beam.
Proposed solution: It seems difficult to fundamentally fix the issue of Py_AddPendingCall calls never being executed. One viable alternative appears to be to introduce a dedicated Dr.Jit cleanup daemon thread that periodically exectures Dr.Jit issued cleanup calls. This is very similar to the idea of Py_AddPendingCall, but avoids the issue of never actually executing the calls. This PR implements a simple version of this to start discussing if this makes sense. The daemon thread is started when Dr.Jit is imported, and then periodically runs cleanup callbacks. This seems to work, but I haven't tested it enough to be absolutely sure it cannot generate a deadlock. The current version is WIP in that the "busy waiting" of the cleanup thread is probably not ideal, and it would be better to switch to some signals-based waiting routine. I can do that if the general direction is deemed reasonable.
What do you think? Would it make sense to bring this to a mergable state? The current memory leak is quite challenging to deal with in practice, and might be hard to debug for others.
Hi Delio, I am not opposed to the idea. But I think it's lower overhead to realize this with a C++ thread that waits on a condition_variable and then acquires the GIL when it receives an object to free.
ah that's a good idea, I can take a stab at this next week and update the PR
I've updated the code to use a C++ thread and a condition variable to signal that items should be cleaned up.
I removed the possibly flaky test and added a sync_thread call to the atexit routine. Let me know if anything else should be changed
Thanks, merged!
Thanks Wenzel!