cog icon indicating copy to clipboard operation
cog copied to clipboard

Large models timeout on download

Open Vochsel opened this issue 1 year ago • 4 comments

When running cog predict on large models (SDXL for example), users with slow internet connections, or far away from weight storage (Australia seems to be quite far from r8.im storage), experience timeouts when running cog.

Example command: cog predict r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b -i prompt="A bunny" --debug

Output:

Checking for updates...
$ docker image inspect r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b

Starting Docker image r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b and running setup()...
$ docker run --rm --shm-size 8G --detach --env COG_LOG_LEVEL=debug --gpus all --publish 0:5000 r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b
result of update check:
{"logger": "torch.distributed.nn.jit.instantiator", "timestamp": "2023-12-31T00:12:18.880798Z", "severity": "INFO", "message": "Created a temporary directory at /tmp/tmp6ian2rtl"}
{"logger": "torch.distributed.nn.jit.instantiator", "timestamp": "2023-12-31T00:12:18.881083Z", "severity": "INFO", "message": "Writing /tmp/tmp6ian2rtl/_remote_module_non_scriptable.py"}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.798613Z", "severity": "INFO", "message": "Started server process [8]"}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.798739Z", "severity": "INFO", "message": "Waiting for application startup."}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.801408Z", "severity": "INFO", "message": "Application startup complete."}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.801978Z", "severity": "INFO", "message": "Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)"}
Loading safety checker...
downloading url:  https://weights.replicate.delivery/default/sdxl/safety-1.0.tar
downloading to:  ./safety-cache
ⅹ Timed out

Is there any way to increase this timeout?

Thanks!

Vochsel avatar Dec 31 '23 00:12 Vochsel

This is still an issue in the latest cog... Makes cog impossible to use locally...

Vochsel avatar Mar 11 '24 22:03 Vochsel

same issue

idootop avatar May 10 '24 07:05 idootop