cog
cog copied to clipboard
Large models timeout on download
When running cog predict on large models (SDXL for example), users with slow internet connections, or far away from weight storage (Australia seems to be quite far from r8.im storage), experience timeouts when running cog.
Example command:
cog predict r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b -i prompt="A bunny" --debug
Output:
Checking for updates...
$ docker image inspect r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b
Starting Docker image r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b and running setup()...
$ docker run --rm --shm-size 8G --detach --env COG_LOG_LEVEL=debug --gpus all --publish 0:5000 r8.im/stability-ai/sdxl@sha256:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b
result of update check:
{"logger": "torch.distributed.nn.jit.instantiator", "timestamp": "2023-12-31T00:12:18.880798Z", "severity": "INFO", "message": "Created a temporary directory at /tmp/tmp6ian2rtl"}
{"logger": "torch.distributed.nn.jit.instantiator", "timestamp": "2023-12-31T00:12:18.881083Z", "severity": "INFO", "message": "Writing /tmp/tmp6ian2rtl/_remote_module_non_scriptable.py"}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.798613Z", "severity": "INFO", "message": "Started server process [8]"}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.798739Z", "severity": "INFO", "message": "Waiting for application startup."}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.801408Z", "severity": "INFO", "message": "Application startup complete."}
{"logger": "uvicorn.error", "timestamp": "2023-12-31T00:12:19.801978Z", "severity": "INFO", "message": "Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)"}
Loading safety checker...
downloading url: https://weights.replicate.delivery/default/sdxl/safety-1.0.tar
downloading to: ./safety-cache
ⅹ Timed out
Is there any way to increase this timeout?
Thanks!
This is still an issue in the latest cog... Makes cog impossible to use locally...
same issue