pymc-bart icon indicating copy to clipboard operation
pymc-bart copied to clipboard

multiprocessing.Manager() leaves child process alive after deleting Model object.

Open twj8CDC opened this issue 11 months ago • 5 comments

Hello,

I am not quite sure if this can be considered a bug, but I thought I would share. Feel free to close if this is too much of an edge issue.

I am running a simulation using the bart model that involved creating and deleting the pymc model (with a bart component) in each iteration. I noticed that as I went through iterations I would accumulate python processes that were no longer using CPU but appeared to hold memory(~50-100mb).

When many iterations were done I started having OOM issues due to these processes gradually taking up memory. These processes would die once the main process dies.

These processes were not the multi-chain/multi-thread processes used in the training/inference (those associated processes were spun-up/down correctly).

I believe the issue comes from the multiprocessing.Manager() used to create the 'all_trees' list.

To resolve the issue I used the following codeblock after each iteration was complete.

import multiprocessing as mp

childs = mp.active_children()
    for child in childs:
        child.kill()

This resolves the issue of lingering processes.

I am not sure if this should be considered bug or not, since it only becomes an issue when a high number of bart models are being created in a single python script. And I don't know if there is really a good general solution to resolving this issue, because if you kill the child process created by the Manager to early, I would expect there to be issues with further use of the model.

That being said, I could see other users running into this issue if doing a highly iterative process and generally I would say that it having a process that doesn't die when the model is deleted is unexpected behavior. So I just wanted to share my experience for future users reference.

Feel free to close or remove this submission if it is unhelpful.

Thanks!

twj8CDC avatar Mar 07 '24 18:03 twj8CDC

Hi, thanks for sharing. I think this is a bug even when it will only affect a portion of the users and also this is related to the issues people have been observing on Mac. Not sure of a good general solution either.

aloctavodia avatar Mar 08 '24 15:03 aloctavodia

One potential solution could be to capture the PID of the manager when it is created (in the BART class). Then add a deconstructor (del) that will kill that process when the class is deleted.

A simple example of this

import multiprocessing as mp
import psutil as ps

# create class
class c1():
    def __init__(self):
        self.a = 1
        manager = mp.Manager()
        # collect the pid for the manager 
        self.process = ps.Process(manager._process.ident)
        self.lst = manager.list()
        
    def __del__(self):
        print("DELETING PROCESS")
        self.process.kill()
    
    def get_process_id(self):
        print(self.process)

class c2():
    def __init__(self):
        self.c11 = c1()
        print("CREATED A NEW MANAGER")
        print(self.c11.get_process_id())
# create an instance of class with Mangaer
c11 = c1()

# print the process id
print("This is the manager pid")
print(c11.get_process_id())
# print the active children (process id should match)
print("Above should be in this list")
print(mp.active_children())
print("Deleting the object will kill the manager process")
del c11
print("The list shouldn't contain the process")
print(mp.active_children())
# works when class is contained in another class
c22 = c2()
mp.active_children()
del c22
mp.active_children()

As far as I can tell the BART class instance persists through the use of the higher level model instance, so I wouldn't expect there to be any unexpected behavior of this process being killed prior to the deletion of the model instance. And based on this simple example I believe that deletion of the model instance will result in the BART instance to be deleted and the process to be properly killed. But I also am not super familar with all of the PYMC internals, so this approach could also cause some unexpected issues.

twj8CDC avatar Mar 08 '24 16:03 twj8CDC

Would you like to give it a try and send a PR?

aloctavodia avatar Mar 12 '24 22:03 aloctavodia

Yeah sure. Might be a few weeks before I can get to it, but I will give it a try.

twj8CDC avatar Mar 13 '24 12:03 twj8CDC

Thank you! Take your time.

aloctavodia avatar Mar 13 '24 13:03 aloctavodia