lkpy
lkpy copied to clipboard
Parallel processing locks when the OOM killer comes for a worker
When LensKit is working in parallel (e.g. batch.recommend
), and the OOM killer takes out a worker, the parent LensKit process will (sometimes) hang instead of terminating.
We should detect this case and abort the entire evaluation if the pool breaks down.
I have tried to reproduce this with processes that invoke os.kill(os.getpid(), 9)
, and the parent process terminates correctly.
OOM-induced deadlocks in Python multiprocessing seem to be one of the bugs fixed in concurrent.futures.ProcessPoolExecutor
in Python 3.7 and newer, and we saw this on Python 3.8.