cog
cog copied to clipboard
Fail while pushing with --separate-weights
Today I started to face the following issue, while using cog push --separate-weights
Although I was able to push the model without the flag --separate-weights
On the error below, I just replaced the project and model name by
$ cog push --separate-weights
⚠ Cog doesn't know if CUDA 11.2.2 is compatible with PyTorch 1.13.1. This might cause CUDA problems.
Building Docker image from environment in cog.yaml as r8.im/<project-name>/<model-name> ...
Weights unchanged, skip rebuilding and use cached image...
[+] Building 4.0s (7/7) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 22.25kB 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 4.41kB 0.0s
=> resolve image config for docker.io/docker/dockerfile:1.4 1.6s
=> CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a1 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04 1.2s
=> ERROR [internal] load metadata for r8.im/<project-name>/<model-name> 2.2s
=> [auth] <project-name>/<model-name> -weights:pull token for r8.im 0.0s
------
> [internal] load metadata for r8.im/<project-name>/<model-name>-weights:latest:
------
Dockerfile:2
--------------------
1 | #syntax=docker/dockerfile:1.4
2 | >>> FROM r8.im/<project-name>/<model-name>-weights AS weights
3 | FROM nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04
4 | ENV DEBIAN_FRONTEND=noninteractive
--------------------
ERROR: failed to solve: failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://r8.im/_token?scope=repository%3A<project-name>%2F<model-name>-weights%3Apull&service=us-docker.pkg.dev: 404 Not Found
ⅹ Failed to build runner Docker image: Failed to build Docker image: exit status 1
same here!
facing same issue
facing same issue today, but week ago it works well with --separate-weights
can we get some help on this?
@hongchaodeng I saw you implemented this feature. Do you know what's going on? Thank you so much!!
I got a similar error, but deleting "path/to/your/cog_project/.dockerignore" and "path/to/your/cog_project/.dockerignore/.cog" files solved it for me.
I faced a similar issue too. The docker build was failing to find the copied data.
=> ERROR [1/4] COPY checkpoints/canny /src/checkpoints/canny 0.0s
=> ERROR [2/4] COPY checkpoints/ip_adapter /src/checkpoints/ip_adapter 0.0s
=> ERROR [3/4] COPY checkpoints/tile /src/checkpoints/tile 0.0s
=> ERROR [4/4] COPY checkpoints/vae /src/checkpoints/vae
...
Dockerfile:11
--------------------
9 | COPY checkpoints/canny /src/checkpoints/canny
10 | COPY checkpoints/ip_adapter /src/checkpoints/ip_adapter
11 | >>> COPY checkpoints/vae /src/checkpoints/vae
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref 46e45d4e-74bc-4316-b8d3-ef813683c1c8::umpry926pu2og534hz3uqwpxt: "checkpoints/vae": not found
while the file was actually here.
I stopped using replicate due to the poor tech support and framework.
What are you using as a replacement?
Runpod is way better with better support.
PS: like @masahiro-koga-jai, deleting the .dockerignore
solved it for me. The .dockerignore
is updated during cog build
, and it obviously conflicts.
I got a similar error, but deleting "path/to/your/cog_project/.dockerignore" and "path/to/your/cog_project/.dockerignore/.cog" files solved it for me.
@ynie @hervenivon This and some other issues lead to a frustrating DX on Replicate, but YMMV building on Runpod. Personally my experience matches the reports here https://www.reddit.com/r/LocalLLaMA/comments/17il9n3/experience_on_runpod/
(I would definitely prefer Runpod's 4090's over A40's for image gen – they're half the price and twice as fast.)
You may also need to rm -r .cog/
. I believe I got this error after a bad cog push --separate-weights
.
My guess is r8.im/<project-name>/<model-name>-weights
gets created on the first invocation, only.
Deleting Cog's build folder seems to have forced it to create the missing image.
I'm still shocked that this is still an issue after so many months. I remember wasting so many hours trying to fix this. Does anyone working at Replicate care?
Based on their Discord, my sense is they're absolutely swamped by end-users who mostly want to use the web frontends for various tools. Ideally Replicate knows this is not their core business, but I'm not so sure. I suspect they're feeling stronger PMF on the front-end than on the infra side of things.
Actually, I find cog
super convenient for some of the projects I'm working on, but I do agree that the UX has some flaws.
Glad to find support in the community. Thanks! 🙏
PS: like @masahiro-koga-jai, deleting the
.dockerignore
solved it for me. The.dockerignore
is updated duringcog build
, and it obviously conflicts.I got a similar error, but deleting "path/to/your/cog_project/.dockerignore" and "path/to/your/cog_project/.dockerignore/.cog" files solved it for me.
yes, I had added .cog/
in .dockerignore
file, removing it solved the problem for me