dreambooth icon indicating copy to clipboard operation
dreambooth copied to clipboard

Output Docker container fails to run predictions

Open achuinard opened this issue 2 years ago • 7 comments

I'm running my private Replicate Docker image on my own Google Cloud VM instance after running a Dreambooth training.

Unfortunately, this command fails:

curl http://localhost:5000/predictions -X POST -H "Content-Type: application/json" -d '{"input":{"prompt":"a photo of zwx man", "width": 512, "height": 512}}'

It quickly fails with just "Internal Server Error". However, when looking at the Docker container logs, it seems like a prediction actually ran.

Using seed: 12294
using txt2img
INFO:     172.17.0.1:57596 - "POST /predictions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/applications.py", line 270, in __call__
    await super().__call__(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/applications.py", line 124, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/routing.py", line 706, in __call__
    await route.handle(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/routing.py", line 235, in app
    raw_response = await run_endpoint_function(
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/cog/server/http.py", line 94, in predict
    generic_response = runner.predict(request).get()
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/cog/server/runner.py", line 84, in predict
    handler.append_logs(event.message)
  File "/root/.pyenv/versions/3.10.9/lib/python3.10/site-packages/cog/server/runner.py", line 142, in append_logs
    assert self.p.logs
AssertionError
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:01<00:51,  1.05s/it]
  6%|▌         | 3/50 [00:01<00:14,  3.21it/s]
 12%|█▏        | 6/50 [00:01<00:06,  6.65it/s]
 18%|█▊        | 9/50 [00:01<00:04,  9.68it/s]
 24%|██▍       | 12/50 [00:01<00:03, 12.27it/s]
 30%|███       | 15/50 [00:01<00:02, 14.33it/s]
 36%|███▌      | 18/50 [00:01<00:02, 15.96it/s]
 42%|████▏     | 21/50 [00:02<00:01, 17.23it/s]
 48%|████▊     | 24/50 [00:02<00:01, 18.18it/s]
 54%|█████▍    | 27/50 [00:02<00:01, 18.87it/s]
 60%|██████    | 30/50 [00:02<00:01, 19.28it/s]
 66%|██████▌   | 33/50 [00:02<00:00, 19.60it/s]
 72%|███████▏  | 36/50 [00:02<00:00, 19.89it/s]
 78%|███████▊  | 39/50 [00:02<00:00, 20.11it/s]
 84%|████████▍ | 42/50 [00:03<00:00, 20.29it/s]
 90%|█████████ | 45/50 [00:03<00:00, 20.41it/s]
 96%|█████████▌| 48/50 [00:03<00:00, 20.48it/s]
100%|██████████| 50/50 [00:03<00:00, 14.49it/s]

What is going on at the HTTP layer here? Why can't I POST to my Docker container successfully?

achuinard avatar Feb 06 '23 06:02 achuinard

This might be an issue with the recent work adding async support to https://github.com/replicate/cog

@achuinard to reproduce, I assume you trained this model recently? (can you provide the an ID from dreambooth training api?)

anotherjesse avatar Feb 07 '23 13:02 anotherjesse

Yes, these are all recent model trainings. One recent prediction ID is mc4st5d7ozhv5gc25olufyotpi.

Thanks for getting back to me, @anotherjesse.

achuinard avatar Feb 07 '23 15:02 achuinard

Also how is it working on Replicate if the cogs themselves are just broken...you must be doing some magic!

achuinard avatar Feb 07 '23 21:02 achuinard

Or I just need to start using the async header and letting Cog webhook me.

achuinard avatar Feb 07 '23 21:02 achuinard

@achuinard - we have switched over to cog's "async" style:

  • async api for cog makes the requests look the same as replicate - enabling users to switch between the team (local cog vs replicate) in an easier manner
  • async is better for larger systems :)

That said, if the sync API is breaking, that isn't good

To clarify - the prediction you shared is a dreambooth training. The cog you are running locally is:

  • you downloading and running the cog image we built as part of dreambooth api?
  • or did you download those weights and build your own dreambooth inference using cog directly (some of our users build their own cogs using https://github.com/replicate/dreambooth-template

anotherjesse avatar Feb 08 '23 15:02 anotherjesse

@anotherjesse This is the Cog / Docker image that your Dreambooth API creates. I did not go through the process of building my own Cog.

async makes sense, I think I'm using it right:

curl http://localhost:5000/predictions -X POST -H "Prefer: respond-async" -H "Content-Type: application/json;charset=utf-8"  -d '{"input":{"prompt":"a photo of zwx man", "width": 512, "height": 512, "disable_safety_check": true}, "webhook":"https://en58kxviypofc.x.pipedream.net"}'

Still no luck though.

Is there any documentation on async invocations using cog predict?

achuinard avatar Feb 09 '23 03:02 achuinard

Also, I'm trying to use the PUT endpoint instead, passing in a generated UUID. Oddly, it just 404s on me. I must be doing something terribly wrong.

root@mutaro-image-generator:/home/tony_chuinard# curl -X PUT http://localhost:50000/predictions/6e2d263d-40a0-4219-ad62-ea12b7922da1 -H "Prefer: respond-async" -H "Content-Type: application/json"  -d '{"input":{"prompt":"a photo of zwx man", "width": 512, "height": 512, "disable_safety_check": true}, "webhook":"https://en58kxviypofc.x.pipedream.net"}'
{"detail":"Not Found"}

achuinard avatar Feb 09 '23 04:02 achuinard