Change ThreadLock to ThreadRLock to resolve rare deadlock
Summary
This pull request addresses a rare edge case issue causing a thread deadlock with access to _optional_thread_lock in ConnectionPool. The solution involves changing ThreadLock to ThreadRLock to allow reentrant locking and resolve the deadlock.
We have encountered a rare deadlock issue in our production environment while using HTTP Core in a multithreaded setup. The issue manifests in rare cases as threads indefinitely waiting to acquire a lock, causing the entire worker to hang. This deadlock occurs when the same thread attempts to acquire the lock multiple times without releasing it.
This is important to meet the library's goal of being thread safe.
For more details, please refer to the discussion here. This issue is also felt by other users of the lib although explained less clearly like in https://github.com/encode/httpcore/discussions/997.
Checklist
- [x] I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
- [x] I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
- [x] I've updated the documentation accordingly.
@tomchristie Is the team aware of this PR? I'd love to get an opinion on this PR because we still see this issue pop up. (https://github.com/encode/httpcore/discussions/990)
I tested it, and this pull request doesn't seem to work for https://github.com/encode/httpcore/issues/1029 ?
I tested it, and this pull request doesn't seem to work for #1029 ?
Could be, the issue we experience is not really related to large files so your problem likely has a different root cause
Unfortunately after quite some time trying we ourselves do not know the exact condition causing this.
I wouldn't make this change without a clear understanding of why a re-entrant lock would be required here.
Incidentally: The httpx 1..0 prelease has a simpler stack here. I'd be more inclined to put my time into pushing that forward. https://www.encode.io/httpnext/
I wouldn't make this change without a clear understanding of why a re-entrant lock would be required here.
We are seeing this issue sporadically as well. Specifically, it seems to occur when an exception is thrown in the middle of http response streaming, and an immediate retry after that. We haven't been able to reproduce it reliably based on these factors alone, unfortunately, but I managed to grab a few thread dumps, and it clearly shows an attempt to acquire a lock from within _assign_requests_to_connections which already holds that lock.
File "/opt/venv/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 182, in parse
return self._post(
File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1259, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/opt/venv/lib/python3.12/site-packages/openai/_base_client.py", line 982, in request
response = self._client.send(
File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 914, in send
response = self._send_handling_auth(
File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 1014, in _send_single_request
response = transport.handle_request(request)
File "/opt/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 250, in handle_request
resp = self._pool.handle_request(req)
File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 228, in handle_request
closing = self._assign_requests_to_connections()
File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 294, in _assign_requests_to_connections
and len([connection.is_idle() for connection in self._connections])
File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection.py", line 192, in is_idle
def is_idle(self) -> bool:
File "/opt/venv/lib/python3.12/site-packages/httpx/_models.py", line 900, in iter_bytes
yield chunk
File "/opt/venv/lib/python3.12/site-packages/httpx/_models.py", line 954, in iter_raw
yield chunk
File "/opt/venv/lib/python3.12/site-packages/httpx/_client.py", line 154, in __iter__
yield chunk
File "/opt/venv/lib/python3.12/site-packages/httpx/_transports/default.py", line 128, in __iter__
yield part
File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 406, in __iter__
self.close()
File "/opt/venv/lib/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 416, in close
with self._pool._optional_thread_lock:
File "/opt/venv/lib/python3.12/site-packages/httpcore/_synchronization.py", line 268, in __enter__
self._lock.acquire()