billiard
billiard copied to clipboard
BUG: Pool not closing pipes after close and join
I'm using billiard.Pool to run parallel computations. I have a continuous service (celery) doing it and after a while I get an OSError 24 ("Too many open files"). While debugging I found that a Pool creates a number of pipes: in pool.py while doing _setup_queues()
self._inqueue = self._ctx.SimpleQueue() # creates 2 pipes
self._outqueue = self._ctx.SimpleQueue() # creates 2 pipes
and then 2 for each process in the Pool.
When joining, it closes pipes for each of the process but not 4 for _inqueue and _outqueue. Those become open forever till the MAIN process is closed. Code to reproduce:
import billiard as mp
import os
def f(a):
return a + 1
if __name__ == '__main__':
pid = os.getpid()
get_number_of_conns = os.popen(f'ls -l /proc/{pid}/fd | wc -l').read()
print(f'At the beginning we only have {get_number_of_conns.strip()} connections')
for i in range(10):
# creating a pool
pool = mp.Pool(mp.cpu_count() - 1)
# running a job
result = pool.map(f, range(5))
# closing the pool and joining
pool.close()
pool.join()
# getting number of open connections
get_number_of_conns = os.popen(f'ls -l /proc/{pid}/fd | wc -l').read()
print(f'Open connections: {get_number_of_conns.strip()}')
Any movement on this? Seems like an easy fix
I don't think there was. We have so much on our plate and we don't get to address all the issues.
If you have a solution, please provide us with a PR.
We are also experiencing this problem, which prevents us from using the implemented multiprocessing solution and forces us to use a 3rd party one (multiprocess). Could you please investigate this behaviour?
I found a POSSIBLE solution:
to use: https://stackoverflow.com/questions/21485319/high-memory-usage-using-python-multiprocessing
maxtasksperchild=
Can somebody super fast improve it_? or can i_?
We're not using multiprocessing at all.
I took a quick look and it does seem that the Pipe object should be closed if __del__ is ever called.
See 👇
https://github.com/celery/billiard/blob/0391a4bfe121345f2961b2475e198399aaebceee/billiard/connection.py#L155-L159
Unfortunately, we seem to have a cyclic reference somewhere which prevents __del__ from being called.
Oh this gets worse because terminating the pool requires the outqueue. The best thing I can do here is to include closing those pipes when terminating.
With the fix I currently have you'll have to do the following:
import gc
import billiard as mp
import os
def f(a):
return a + 1
if __name__ == '__main__':
pid = os.getpid()
get_number_of_conns = os.popen(f'ls -l /proc/{pid}/fd | wc -l').read()
print(f'At the beginning we only have {get_number_of_conns.strip()} connections')
for i in range(10):
# creating a pool
pool = mp.Pool(mp.cpu_count() - 1)
# running a job
result = pool.map(f, range(5))
# closing the pool and joining
pool.close()
pool.join()
pool.terminate()
# getting number of open connections
get_number_of_conns = os.popen(f'ls -l /proc/{pid}/fd | wc -l').read()
print(f'Open connections: {get_number_of_conns.strip()}')
get_number_of_conns = os.popen(f'ls -l /proc/{pid}/fd | wc -l').read()
print(f'At the end we only have {get_number_of_conns.strip()} connections')
I pushed that fix.
@clanzett Can you check my partial fix?
I am on it. Unfortunately this could take a while because the error only happens after approx. 1h of runtime of our jobs. I will keep you posted. Anyway thx for the fix!!
@thedrow : Ok. Your change seems to fix the problem. Great job and many thanks!! Is there any ETA when the new master branch will find its way into a new python package?
I'm working a Celery release so very soon.
Is there any progress with this one? Stuck with the issue, too. Thanks!
UPD. I messed it up a bit, I'm stuck with #217, a similar one.
I have the same issue, The performance is very good when I was using the pool with the with keyword, but when I switched to terminating the pool explicitly as suggested, the performance went down.
I currently don't have a better fix for this problem. Feel free to suggest one.
@celery/core-developers This is a big problem for us. If anyone has the time to investigate, please do.