grobid_client_python icon indicating copy to clipboard operation
grobid_client_python copied to clipboard

Why use ProcessPoolExecutor at all?

Open iiLaurens opened this issue 2 years ago • 1 comments

The code comments mention that ProcessPoolExecutor is used in favour of ThreadPoolExecutor and mentioning the python GIL as one of the reasons. I would like to argue that it ThreadPoolExecutor is perfectly fine in this use case.

First of all, the GIL is only a problem for threads when threads execute python code. The GIL only allows one thread to use the python interpreter, which renders the other threads useless. However for I/O tasks, python releases the GIL, asks the OS to handle the I/O request, and suspends the active python thread so that other threads can continue.

The grobid client is simply a wrapper that sends a batch of post requests. No heavy calculations are done on the python since and hence using ThreadPoolExecutor is perfectly fine, has much less overhead and is much less troublesome across different OS'es. Would it be possible to make the ThreadPoolExecutor default?

iiLaurens avatar Jul 04 '22 15:07 iiLaurens

Hi @iiLaurens !

Thank you for the issue, you're absolutely right. I am actually using ThreadPoolExecutor in my more recent python clients for I/O intensive tasks. I think at the time I wrote this client (4 years ago), I was a bit confused by this aspect and I didn't come back to it afterwards.

I push an update replacing ProcessPoolExecutor - see e7710c205601af29889a0e6e23287ec265e038c7

kermitt2 avatar Jul 04 '22 16:07 kermitt2