gflownet icon indicating copy to clipboard operation
gflownet copied to clipboard

Make multiprocessing terminate gracefully

Open bengioe opened this issue 2 years ago • 2 comments

Current multiprocessing/threading routines are not explicitly stopped, they just rely on the objects they belong to to be garbage collected to stop. This sometimes causes aesthetically displeasing logs where all the threads produce errors.

bengioe avatar Feb 22 '23 19:02 bengioe

Addressed by #117 and #116. I will leave this up as a reminder to run tests but most problems on this front are presumably solved.

bengioe avatar Feb 06 '24 22:02 bengioe

I did my best to flush all the queues, but I still think it ends up freezing on rare occasions on Beluga (compute Canada/calculate Quebec). I do not understand why it's just Beluga not Cedar/Narval/Mila's cluster. I have this snippet of code that I'd use if running jobs on the clusters :upside_down_face:

def haragiri(signum, frame):
    os.kill(os.getpid(), signal.SIGTERM)
signal.signal(signal.SIGALRM, haragiri)
signal.alarm(10 * 60)

Another thing to consider is making the code work with other multithreading strategies. AFAIK set_start_method spawn or forkserver does not work currently but are the "recommended" way of starting new processes.

SobhanMP avatar Feb 17 '24 05:02 SobhanMP