cvat-opencv icon indicating copy to clipboard operation
cvat-opencv copied to clipboard

Siammask - base docker image tag is invalid

Open saurabheights opened this issue 1 year ago • 2 comments

My actions before raising this issue

  • [x] Read/searched the docs
  • [x] Searched past issues

Deploying SIAMMask currently fails due to use of obsolete base docker image tag.

https://github.com/opencv/cvat/blob/develop/serverless/pytorch/foolwood/siammask/nuclio/function-gpu.yaml#L21 uses nvidia/cuda:11.1-devel-ubuntu20.04, however this is not a valid image name anymore (assuming it used to work, but nvidia removed the tag).

Note - I have produced error on cvat version 2.1.0 (master), however the issue is also in develop branch.

Expected Behaviour

Model should load.

Current Behaviour

Deploying simamask causes following error -

$ nuctl deploy --project-name cvat --path /prod/cvat/serverless/pytorch/foolwood/siammask/nuclio --volume /prod/cvat/serverless/common:/opt/nuclio/common --file /prod/cvat/serverless/pytorch/foolwood/siammask/nuclio/function-gpu.yaml --platform local

22.09.06 14:57:09.434                     nuctl (W) Failed to create a function; setting the function status {"err": "Failed to build processor image", "errVerbose": "\nError - exit status 1\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon  44.69MB\r\r\nStep 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04\n\nstderr:\nmanifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n    /nuclio/pkg/dockerclient/shell.go:118\nFailed to build docker image\n    .../pkg/containerimagebuilderpusher/docker.go:53\nFailed to build processor image\n    /nuclio/pkg/processor/build/builder.go:250\nFailed to build processor image", "errCauses": [{"error": "Failed to build docker image", "errorVerbose": "\nError - exit status 1\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon  44.69MB\r\r\nStep 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04\n\nstderr:\nmanifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n    /nuclio/pkg/dockerclient/shell.go:118\nFailed to build docker image\n    .../pkg/containerimagebuilderpusher/docker.go:53\nFailed to build docker image", "errorCauses": [{"error": "Failed to build", "errorVerbose": "\nError - exit status 1\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon  44.69MB\r\r\nStep 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04\n\nstderr:\nmanifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n    /nuclio/pkg/dockerclient/shell.go:118\nFailed to build", "errorCauses": [{"error": "stdout:\nSending build context to Docker daemon  44.69MB\r\r\nStep 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04\n\nstderr:\nmanifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown\n", "errorVerbose": "\nError - exit status 1\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon  44.69MB\r\r\nStep 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04\n\nstderr:\nmanifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nstdout:\nSending build context to Docker daemon  44.69MB\r\r\nStep 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04\n\nstderr:\nmanifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown\n", "errorCauses": [{"error": "exit status 1"}]}]}]}]}

Error - exit status 1
    /nuclio/pkg/cmdrunner/shellrunner.go:96

Call stack:
stdout:
Sending build context to Docker daemon  44.69MB
Step 1/25 : FROM nvidia/cuda:11.1-devel-ubuntu20.04

stderr:
manifest for nvidia/cuda:11.1-devel-ubuntu20.04 not found: manifest unknown: manifest unknown

    /nuclio/pkg/cmdrunner/shellrunner.go:96
Failed to build
    /nuclio/pkg/dockerclient/shell.go:118
Failed to build docker image
    .../pkg/containerimagebuilderpusher/docker.go:53
Failed to build processor image
    /nuclio/pkg/processor/build/builder.go:250
Failed to deploy function
    ...//nuclio/pkg/platform/abstract/platform.go:182

Possible Solution

Using nvidia image nvidia/cuda:11.1.1-devel-ubuntu20.04

Steps to Reproduce (for bugs)

  1. Run docker pull nvidia/cuda:11.1-devel-ubuntu20.04.

Context

Trying to deploy Siammask for tracking objects (automatic annotation).

Your Environment

  • Git hash commit (git log -1): commit 3bd7c7e422d57986bd629da07214a3a3e666c68c (HEAD -> master, tag: v2.1.0, origin/master)
  • Docker version docker version (e.g. Docker 17.0.05): Docker version 20.10.9, build c2ea9bc
  • Are you using Docker Swarm or Kubernetes? No
  • Operating System and version (e.g. Linux, Windows, MacOS): Linux

P.S. I can submit PR if alright with admins.

saurabheights avatar Sep 06 '22 14:09 saurabheights

can confirm, i had the same issue: https://github.com/opencv/cvat/pull/4886/commits/b7de6aef32229cb628e66ae20ec0bc9ffeb68bcd

dschoerk avatar Sep 07 '22 14:09 dschoerk

@saurabheights , if you can prepare the PR, it will be great!

nmanovic avatar Sep 08 '22 14:09 nmanovic

@nmanovic Sorry, I just saw your message. I have made the PR.

saurabheights avatar Oct 03 '22 18:10 saurabheights

Changes needed have been merged to develop branch, closing the issue.

saurabheights avatar Oct 05 '22 08:10 saurabheights