dreambooth Output Docker container fails to run predictions

I'm running my private Replicate Docker image on my own Google Cloud VM instance after running a Dreambooth training.

Unfortunately, this command fails:

curl http://localhost:5000/predictions -X POST -H "Content-Type: application/json" -d '{"input":{"prompt":"a photo of zwx man", "width": 512, "height": 512}}'

It quickly fails with just "Internal Server Error". However, when looking at the Docker container logs, it seems like a prediction actually ran.

Using seed: 12294
using txt2img
INFO:     172.17.0.1:57596 - "POST /predictions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/routing.py", line 235, in app
    raw_response = await run_endpoint_function(
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/cog/server/http.py", line 94, in predict
    generic_response = runner.predict(request).get()
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/cog/server/runner.py", line 84, in predict
    handler.append_logs(event.message)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/cog/server/runner.py", line 142, in append_logs
    assert self.p.logs
AssertionError
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:01<00:51,  1.05s/it]
  6%|▌         | 3/50 [00:01<00:14,  3.21it/s]
 12%|█▏        | 6/50 [00:01<00:06,  6.65it/s]
 18%|█▊        | 9/50 [00:01<00:04,  9.68it/s]
 24%|██▍       | 12/50 [00:01<00:03, 12.27it/s]
 30%|███       | 15/50 [00:01<00:02, 14.33it/s]
 36%|███▌      | 18/50 [00:01<00:02, 15.96it/s]
 42%|████▏     | 21/50 [00:02<00:01, 17.23it/s]
 48%|████▊     | 24/50 [00:02<00:01, 18.18it/s]
 54%|█████▍    | 27/50 [00:02<00:01, 18.87it/s]
 60%|██████    | 30/50 [00:02<00:01, 19.28it/s]
 66%|██████▌   | 33/50 [00:02<00:00, 19.60it/s]
 72%|███████▏  | 36/50 [00:02<00:00, 19.89it/s]
 78%|███████▊  | 39/50 [00:02<00:00, 20.11it/s]
 84%|████████▍ | 42/50 [00:03<00:00, 20.29it/s]
 90%|█████████ | 45/50 [00:03<00:00, 20.41it/s]
 96%|█████████▌| 48/50 [00:03<00:00, 20.48it/s]
100%|██████████| 50/50 [00:03<00:00, 14.49it/s]

What is going on at the HTTP layer here? Why can't I POST to my Docker container successfully?

Feb 06 '23 06:02 achuinard

This might be an issue with the recent work adding async support to https://github.com/replicate/cog

@achuinard to reproduce, I assume you trained this model recently? (can you provide the an ID from dreambooth training api?)

Feb 07 '23 13:02 anotherjesse

Yes, these are all recent model trainings. One recent prediction ID is mc4st5d7ozhv5gc25olufyotpi.

Thanks for getting back to me, @anotherjesse.

Feb 07 '23 15:02 achuinard

Also how is it working on Replicate if the cogs themselves are just broken...you must be doing some magic!

Feb 07 '23 21:02 achuinard

Or I just need to start using the async header and letting Cog webhook me.

Feb 07 '23 21:02 achuinard

@achuinard - we have switched over to cog's "async" style:

async api for cog makes the requests look the same as replicate - enabling users to switch between the team (local cog vs replicate) in an easier manner
async is better for larger systems :)

That said, if the sync API is breaking, that isn't good

To clarify - the prediction you shared is a dreambooth training. The cog you are running locally is:

you downloading and running the cog image we built as part of dreambooth api?
or did you download those weights and build your own dreambooth inference using cog directly (some of our users build their own cogs using https://github.com/replicate/dreambooth-template

Feb 08 '23 15:02 anotherjesse

@anotherjesse This is the Cog / Docker image that your Dreambooth API creates. I did not go through the process of building my own Cog.

async makes sense, I think I'm using it right:

curl http://localhost:5000/predictions -X POST -H "Prefer: respond-async" -H "Content-Type: application/json;charset=utf-8"  -d '{"input":{"prompt":"a photo of zwx man", "width": 512, "height": 512, "disable_safety_check": true}, "webhook":"https://en58kxviypofc.x.pipedream.net"}'

Still no luck though.

Is there any documentation on async invocations using cog predict?

Feb 09 '23 03:02 achuinard

Also, I'm trying to use the PUT endpoint instead, passing in a generated UUID. Oddly, it just 404s on me. I must be doing something terribly wrong.

root@mutaro-image-generator:/home/tony_chuinard# curl -X PUT http://localhost:50000/predictions/6e2d263d-40a0-4219-ad62-ea12b7922da1 -H "Prefer: respond-async" -H "Content-Type: application/json"  -d '{"input":{"prompt":"a photo of zwx man", "width": 512, "height": 512, "disable_safety_check": true}, "webhook":"https://en58kxviypofc.x.pipedream.net"}'
{"detail":"Not Found"}

Feb 09 '23 04:02 achuinard

dreambooth dreambooth copied to clipboard

Output Docker container fails to run predictions

dreambooth
dreambooth copied to clipboard