cog icon indicating copy to clipboard operation
cog copied to clipboard

catch out of memory errors on `cog build`

Open zeke opened this issue 3 years ago • 2 comments

$ /usr/local/bin/cog build 
Building Docker image from environment in cog.yaml as cog-resnet...
[+] Building 67.4s (10/12)                                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                                     0.0s

.....

 => [stage-0 2/6] COPY .cog/tmp/build1290162384/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl                                       0.0s
 => [stage-0 3/6] RUN --mount=type=cache,target=/root/.cache/pip pip install /tmp/cog-0.0.1.dev-py3-none-any.whl                                        12.2s
 => ERROR [stage-0 4/6] RUN --mount=type=cache,target=/root/.cache/pip pip install   pillow==8.3.1 tensorflow==2.5.1                                    35.5s 
------                                                                                                                                                        
 > [stage-0 4/6] RUN --mount=type=cache,target=/root/.cache/pip pip install   pillow==8.3.1 tensorflow==2.5.1:                                                
#10 4.397 Collecting pillow==8.3.1                                                                                                                            
#10 4.482   Downloading Pillow-8.3.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0 MB)                                                            
#10 6.610 Collecting tensorflow==2.5.1                                                                                                                        
#10 8.083   Downloading tensorflow-2.5.1-cp38-cp38-manylinux2010_x86_64.whl (454.5 MB)
#10 35.49 Killed
------
error: failed to solve: executor failed running [/bin/sh -c pip install   pillow==8.3.1 tensorflow==2.5.1]: exit code: 137
ⅹ Failed to build Docker image: exit status 1

@bfirsh says "Killed" (exit code 137) is a sign that the process is out of memory.

The quick fix is to allocate more memory to Docker.

The longer term fix is to catch this error and report it to the user.

zeke avatar Sep 17 '21 19:09 zeke

Note for anyone in the future running into this-- you can just adjust the Docker container memory configuration directly (e.g. in Docker for Desktop, which I installed to try building a model to push to replicate). I originally thought there was some extra command-line flag for cog but didn't find anything (and isn't necessary since it's not cog-specific)

LWprogramming avatar Aug 31 '23 03:08 LWprogramming

Similarly cog push failed with ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

I ran docker system prune to free local memory and fix the build step.

emcmanus avatar May 01 '24 00:05 emcmanus