cog icon indicating copy to clipboard operation
cog copied to clipboard

The childwork events send deadlock between the subthread write log and async predictor yield a large size output

Open sinopec opened this issue 8 months ago • 2 comments

cog version 0.14.4, async predict , enable concurrency.max = 8 and the async predict func, return a async generator, yield a base64 encoded image with high FPS (10).

And I found mainthread of childworker stuck there : _apredict AsyncConnection.send

and finnaly stuck on mutiprocessing/connection.py/Connection.py:_send

    def _send(self, buf, write=_write):
        remaining = len(buf)
        while True:
            n = write(self._handle, buf)
            remaining -= n
            if remaining == 0:
                break
            buf = buf[n:]

n = write(self._handle, buf)

sinopec avatar Apr 22 '25 05:04 sinopec

It seems happed when several different thread write logger frequently while the output is yiled with big size.

Maybe we should still use LockedConnection instead of AsyncConnection while it is replaced with AsyncConnection

async def _aloop(
        self,
        predict: Callable[..., Any],
        redirector: SimpleStreamRedirector,
    ) -> None:
        # Unwrap and replace the events connection with an async one.
        assert isinstance(self._events, LockedConnection)
        self._events = AsyncConnection(self._events.connection)

This seems difficult to handle because it requires supporting concurrent send calls from coroutines within the same thread, as well as concurrent send calls from other threads

sinopec avatar Apr 22 '25 09:04 sinopec

Would you mind upgrading your cog to the latest, and adding fast: true to your build component of cog.yaml and seeing if you run into the same issue? This uses a different async runner and I'd be curious to see if it has the same problem.

8W9aG avatar Apr 23 '25 20:04 8W9aG