The childwork events send deadlock between the subthread write log and async predictor yield a large size output
cog version 0.14.4, async predict , enable concurrency.max = 8 and the async predict func, return a async generator, yield a base64 encoded image with high FPS (10).
And I found mainthread of childworker stuck there : _apredict AsyncConnection.send
and finnaly stuck on mutiprocessing/connection.py/Connection.py:_send
def _send(self, buf, write=_write):
remaining = len(buf)
while True:
n = write(self._handle, buf)
remaining -= n
if remaining == 0:
break
buf = buf[n:]
n = write(self._handle, buf)
It seems happed when several different thread write logger frequently while the output is yiled with big size.
Maybe we should still use LockedConnection instead of AsyncConnection while it is replaced with AsyncConnection
async def _aloop(
self,
predict: Callable[..., Any],
redirector: SimpleStreamRedirector,
) -> None:
# Unwrap and replace the events connection with an async one.
assert isinstance(self._events, LockedConnection)
self._events = AsyncConnection(self._events.connection)
This seems difficult to handle because it requires supporting concurrent send calls from coroutines within the same thread, as well as concurrent send calls from other threads
Would you mind upgrading your cog to the latest, and adding fast: true to your build component of cog.yaml and seeing if you run into the same issue? This uses a different async runner and I'd be curious to see if it has the same problem.