replicate-python icon indicating copy to clipboard operation
replicate-python copied to clipboard

API call from deployment to deployment hangs forever

Open Clement-Lelievre opened this issue 8 months ago β€’ 2 comments

Hi,

I'm having an issue that I don't get locally, it happens in the following scenario:

  • I have two cog models deployed on Replicate (as Deployments)
  • one of them at some point calls the other (see snippet below)
  • they were built and deployed using cog==0.13.7 , replicate==1.0.4 , and the cog CLI 0.14.3, python 3.11, ubuntu==22.04

Here's how I call one deployment from the other:

from replicate.helpers import base64_encode_file

vectorizer_deployment = replicate.deployments.get(VECTORIZER_DEPLOYMENT)


with open(img_path, "rb") as f:
        b64 = base64_encode_file(f)
prediction = vectorizer_deployment.predictions.create(
            input={"images": [b64_images]} ,
        )
logger.debug(f"{prediction.id=}")
prediction.wait() # this line hangs forever after 30-ish GET requests

The called deployment does complete the inference, and I can see the status as succeeded on Replicate. In the logs of the calling deployment, I can see about 30-ish GET requests, all looking like INFO:httpx:HTTP Request: GET https://api.replicate.com/v1/predictions/7atmc23wmsrga0cp7ag9y5s6pm "HTTP/1.1 200 OK"

I have investigated the replicate python client source code, I can see that the prediction.wait() method calls the '.reload()' method which itselfs performs the GET requests. I've tried increasing the env var REPLICATE_POLL_INTERVAL but to no effect.

Strange thing is, as said above, locally it works! ie:

  • when I run locally in python the main endpoint everything works well (I run like predictor.predict(...) )
  • when I run locally with cog predict -i ..., inference goes through, but at the end after my inference completes I get this error log: {"logger": "cog.server.worker", "timestamp": "2025-04-15T19:11:52.878929Z", "exception": "Traceback (most recent call last):\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 299, in _consume_events\n self._consume_events_inner()\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 337, in _consume_events_inner\n ev = self._events.recv()\n ^^^^^^^^^^^^^^^^^^^\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/multiprocessing/connection.py\", line 251, in recv\n return _ForkingPickler.loads(buf.getbuffer())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: URLPath.__init__() missing 3 required keyword-only arguments: 'source', 'filename', and 'fileobj'", "severity": "ERROR", "message": "unhandled error in _consume_events"}

So far I'm clueless as to why everything suddenly hangs, making all my project useless. I guess it's due to the deployed environment.

@zeke @erbridge @meatballhat @aron @mattt

thanks for your help

Clement-Lelievre avatar Apr 15 '25 18:04 Clement-Lelievre

Hi @Clement-Lelievre πŸ‘‹

Thanks for the detailed report!

This behavior is likely due to network restrictions or internal timeouts in the deployed environment, especially when one deployment tries to poll another repeatedly.

Here are a few things to try:

  1. Use webhook_completed=True when creating the prediction to avoid active polling: prediction = vectorizer_deployment.predictions.create( input={"images": [b64_images]}, webhook_completed=True, )

Then handle the result via webhook, or periodically check the final status externally.

  1. Avoid synchronous .wait() in production deployments β€” it’s better suited for local or CLI environments. Instead, check the prediction status in a non-blocking way or poll with delays and max retries.

  2. Double-check if both deployments are in the same region and using compatible versions of replicate and cog.

Let me know if it helps β€” happy to dig deeper if needed! Thanks again πŸ™Œ

Ivan-developer0 avatar May 15 '25 15:05 Ivan-developer0

Hi @Clement-Lelievre πŸ‘‹

Thanks for the detailed report!

This behavior is likely due to network restrictions or internal timeouts in the deployed environment, especially when one deployment tries to poll another repeatedly.

Here are a few things to try:

  1. Use webhook_completed=True when creating the prediction to avoid active polling: prediction = vectorizer_deployment.predictions.create( input={"images": [b64_images]}, webhook_completed=True, )

Then handle the result via webhook, or periodically check the final status externally.

  1. Avoid synchronous .wait() in production deployments β€” it’s better suited for local or CLI environments. Instead, check the prediction status in a non-blocking way or poll with delays and max retries.
  2. Double-check if both deployments are in the same region and using compatible versions of replicate and cog.

Let me know if it helps β€” happy to dig deeper if needed! Thanks again πŸ™Œ

thanks, as it was urgent I changed the approach to avoid this issue; I'll have to go back to this when I get the time and let you know

Clement-Lelievre avatar May 15 '25 19:05 Clement-Lelievre