anyio icon indicating copy to clipboard operation
anyio copied to clipboard

Thread race condition in _eventloop.get_asynclib() (with fix)

Open danielrobbins opened this issue 2 years ago • 6 comments

I have a threading Web spider using httpx that is triggering a race condition in anyio/_core/_eventloop.py's get_asynclib() function. This race can be triggered if you have a lot of threads trying to initialize their own asyncio event loop and initializing httpx all at once. It happens intermittently -- sometimes we don't trigger the race, sometimes we do. When we do, the threads catch get_asynclib() mid-initialization and certain attributes that should be present are not available. I am using anyio 3.5.0, httpx 0.22.0 and python3.7.

httpx-exception.txt

An ugly but functioning patch which I can confirm resolves the issue is below:

get_asynclib_thread_fix.patch.txt

In my test patch, I made the lock cover essentially the entire get_asynclib() call -- it could potentially be made tighter, with questionable benefit -- not sure.

danielrobbins avatar Mar 09 '22 05:03 danielrobbins

I'm very hesitant to add locking to get_asynclib() which is "hot" code. I'll take a look at this and see if I can find a better solution.

agronholm avatar Mar 09 '22 11:03 agronholm

I do question the use case though – why would you run multiple event loops?

agronholm avatar May 08 '22 09:05 agronholm

If you are using threads, each thread needs its own event loop. So if you ever use ThreadPoolExecutor, and you want to use async inside, each one needs its own event loop. That is the use case in my code. In my case, each thread is performing network operations so to use httpx, I require an event loop to exist for the lifetime of the thread.

danielrobbins avatar May 08 '22 17:05 danielrobbins

I wonder if this problem would still occur if you were to use ProcessPoolExecutor with 'spawn' context, rather than threads. That way each event loop would be running in a separate memory space and your event loops would be isolated. Therefore, there would be no chance of threads racing against each other due to shared state. I think separate processes is the correct solution here in any case, to give you true parallelism.

laker-93 avatar Oct 17 '23 08:10 laker-93

Rather than use sys.modules you can import into a global and return that

graingert avatar Oct 17 '23 09:10 graingert