cvat
cvat copied to clipboard
Semi-automatic Annotation - Documentation outdated, Nvidia, NO_PUBKEY A4B469963BF863CC
My actions before raising this issue
- [x ] Read/searched the docs
- [x ] Searched past issues
Trying to enable semi-automatic annotation from the latest stable version as documented at https://openvinotoolkit.github.io/cvat/docs/administration/advanced/installation_automatic_annotation/ for GPU SUPPORT fails, as Nvidia has changed a key.
Expected Behaviour
Following the documentation should result in successful installation of
serverless/tensorflow/matterport/mask_rcnn/nuclio
Update documentation to either:
- tell that there is currently no fix for it
- or add a correction either to code or documentation
Current Behaviour
Calling
nuctl deploy --project-name cvat \ --path serverless/tensorflow/matterport/mask_rcnn/nuclio \ --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \ --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \ --image cvat/tf.matterport.mask_rcnn_gpu \ --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \ --resource-limit nvidia.com/gpu=1
ends with
Reading package lists... W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.
Possible Solution
According to https://forums.developer.nvidia.com/t/gpg-error-http-developer-download-nvidia-com-compute-cuda-repos-ubuntu1804-x86-64/212904/3 the steps to resolve the problem on Debian based systems is to remove the outdated key and install the current one
sudo apt-key del 7fa2af80
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub
Your Environment
`git log -1 commit d7560bbd39fec68f944515c2591dda74f3764b90 (HEAD -> develop, origin/develop, origin/HEAD) Merge: ba4175bf b7dba6aa Author: Nico Galoppo [email protected] Date: Tue May 17 11:25:58 2022 -0500
Merge pull request #4639 from openvinotoolkit/ncgalopp/fix-build
`
- Docker version:
Docker version 20.10.16, build aa7e414
- Operating System and version:
Ubuntu 20.04
- GPU:
P5000
- nvidia-smi:
NVIDIA-SMI 510.73.05 Driver Version: 510.73.05 CUDA Version: 11.6
Next steps
You may join our Gitter channel for community support.
Hello. I am struggling with this problem and it is very urgent, but I do not know how to resolve it. Maybe I am handling the dockers in a wrong way or modifying a wrong file.
When I try to build .../cvat/serverless/tensorflow/matterport/mask_rcnn_fixed/nuclio/function-gpu.yaml ,
nuctl deploy --project-name cvat --path serverless/tensorflow/matterport/mask_rcnn/nuclio --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." --image cvat/tf.matterport.mask_rcnn_gpu --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' --resource-limit nvidia.com/gpu=1
the log indicates that this problem happens during the execution of line:
RUN apt update && apt install --no-install-recommends -y git curl
Here is the log:
22.08.02 17:47:07.249 nuctl (I) Deploying function {"name": ""}
22.08.02 17:47:07.249 nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.9.1, Git commit: 5fb902dd1fafabed267f79b3267e19804ee93bda, OS: linux, Arch: amd64, Go version: go1.17.10", "name": ""} 22.08.02 17:47:07.436 nuctl (I) Staging files and preparing base images 22.08.02 17:47:07.436 nuctl (W) Python 3.6 runtime is deprecated and will soon not be supported. Please migrate your code and use Python 3.7 runtime (
python:3.7
) or higher 22.08.02 17:47:07.436 nuctl (I) Building processor image {"registryURL": "", "taggedImageName": "cvat/tf.matterport.mask_rcnn_gpu:latest"} 22.08.02 17:47:07.436 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.9.1-amd64"} 22.08.02 17:47:10.356 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"} 22.08.02 17:47:14.246 nuctl.platform (I) Building docker image {"image": "cvat/tf.matterport.mask_rcnn_gpu:latest"} 22.08.02 17:47:18.169 nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-568811864/staging", "cmd": "docker build --network host --force-rm -t cvat/tf.matterport.mask_rcnn_gpu:latest -f /tmp/nuclio-build-568811864/staging/Dockerfile.processor --build-arg NUCLIO_LABEL=1.9.1 --build-arg NUCLIO_ARCH=amd64 --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler .", "stderr": "The command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100\n"} 22.08.02 17:47:18.175 nuctl (W) Failed to create a function; setting the function status {"err": "Failed to build processor image", "errVerbose": "\nError - exit status 100\n /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon 51.16MB\r\r\nStep 1/17 : FROM tensorflow/tensorflow:1.15.5-gpu-py3\n ---> 73be11373498\nStep 2/17 : ARG NUCLIO_LABEL\n ---> Using cache\n ---> ce09667e4588\nStep 3/17 : ARG NUCLIO_ARCH\n ---> Using cache\n ---> ee4549ac7db8\nStep 4/17 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR\n ---> Using cache\n ---> 688565186b35\nStep 5/17 : COPY artifacts/processor /usr/local/bin/processor\n ---> Using cache\n ---> 48a3b91efbc1\nStep 6/17 : COPY artifacts/py /opt/nuclio/\n ---> Using cache\n ---> 39ba78f106bd\nStep 7/17 : COPY artifacts/py-whl /opt/nuclio/whl\n ---> Using cache\n ---> 221a56010c52\nStep 8/17 : COPY artifacts/uhttpc /usr/local/bin/uhttpc\n ---> Using cache\n ---> 09519af89f11\nStep 9/17 : COPY handler /opt/nuclio\n ---> Using cache\n ---> f849808a29d6\nStep 10/17 : HEALTHCHECK --interval=1s --timeout=3s CMD /usr/local/bin/uhttpc --url http://127.0.0.1:8082/ready || exit 1\n ---> Using cache\n ---> b0500d1c8d03\nStep 11/17 : RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl\n ---> Using cache\n ---> a965b5f4b9aa\nStep 12/17 : WORKDIR /opt/nuclio\n ---> Using cache\n ---> 24c47938ae64\nStep 13/17 : RUN apt update && apt install --no-install-recommends -y git curl\n ---> Running in 3dff9034dfcc\n\u001b[91m\nWARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n\n\u001b[0mHit:1 http://archive.ubuntu.com/ubuntu bionic InRelease\nGet:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B]\nGet:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]\nGet:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]\nGet:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]\nIgn:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease\nGet:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B]\nGet:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B]\nErr:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease\n The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\nGet:9 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1107 kB]\nGet:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB]\nGet:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB]\nGet:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3336 kB]\nGet:13 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1527 kB]\nGet:14 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2306 kB]\nGet:15 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]\nGet:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]\nGet:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]\nGet:18 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2905 kB]\nGet:19 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1065 kB]\nReading package lists...\n\u001b[91mW: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\nE: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.\n\u001b[0mRemoving intermediate container 3dff9034dfcc\n\nstderr:\nThe command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100\n\n /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n /nuclio/pkg/dockerclient/shell.go:117\nFailed to build docker image\n .../pkg/containerimagebuilderpusher/docker.go:54\nFailed to build processor image\n /nuclio/pkg/processor/build/builder.go:263\nFailed to build processor image"}Error - exit status 100 /nuclio/pkg/cmdrunner/shellrunner.go:96
Call stack: stdout: Sending build context to Docker daemon 51.16MB Step 1/17 : FROM tensorflow/tensorflow:1.15.5-gpu-py3 ---> 73be11373498 Step 2/17 : ARG NUCLIO_LABEL ---> Using cache ---> ce09667e4588 Step 3/17 : ARG NUCLIO_ARCH ---> Using cache ---> ee4549ac7db8 Step 4/17 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR ---> Using cache ---> 688565186b35 Step 5/17 : COPY artifacts/processor /usr/local/bin/processor ---> Using cache ---> 48a3b91efbc1 Step 6/17 : COPY artifacts/py /opt/nuclio/ ---> Using cache ---> 39ba78f106bd Step 7/17 : COPY artifacts/py-whl /opt/nuclio/whl ---> Using cache ---> 221a56010c52 Step 8/17 : COPY artifacts/uhttpc /usr/local/bin/uhttpc ---> Using cache ---> 09519af89f11 Step 9/17 : COPY handler /opt/nuclio ---> Using cache ---> f849808a29d6 Step 10/17 : HEALTHCHECK --interval=1s --timeout=3s CMD /usr/local/bin/uhttpc --url http://127.0.0.1:8082/ready || exit 1 ---> Using cache ---> b0500d1c8d03 Step 11/17 : RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl ---> Using cache ---> a965b5f4b9aa Step 12/17 : WORKDIR /opt/nuclio ---> Using cache ---> 24c47938ae64 Step 13/17 : RUN apt update && apt install --no-install-recommends -y git curl ---> Running in 3dff9034dfcc
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B] Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] Get:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] Get:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB] Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B] Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B] Err:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC Get:9 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1107 kB] Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB] Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB] Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3336 kB] Get:13 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1527 kB] Get:14 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2306 kB] Get:15 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB] Get:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB] Get:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB] Get:18 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2905 kB] Get:19 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1065 kB] Reading package lists... W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed. Removing intermediate container 3dff9034dfcc
stderr: The command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100
/nuclio/pkg/cmdrunner/shellrunner.go:96
Failed to build /nuclio/pkg/dockerclient/shell.go:117 Failed to build docker image .../pkg/containerimagebuilderpusher/docker.go:54 Failed to build processor image /nuclio/pkg/processor/build/builder.go:263 Failed to deploy function ...//nuclio/pkg/platform/abstract/platform.go:198
I have tried to modify the file .../cvat/serverless/tensorflow/matterport/mask_rcnn_fixed/nuclio/function-gpu.yaml and to run the command again. But the log stays the same (so no additional steps were executed between
Step 12/17 : WORKDIR /opt/nuclio ---> Using cache ---> 24c47938ae64 and Step 13/17 : RUN apt update && apt install --no-install-recommends -y git curl ---> Running in 3dff9034dfcc
Additional steps I wanted to add are commands from https://github.com/NVIDIA/nvidia-container-toolkit/issues/257
I edited this fragment of the function file:
build: image: cvat/tf.matterport.mask_rcnn baseImage: tensorflow/tensorflow:1.15.5-gpu-py3 directives: postCopy: - kind: WORKDIR value: /opt/nuclio - kind: RUN value: rm /etc/apt/sources.list.d/cuda.list - kind: RUN value: rm /etc/apt/sources.list.d/nvidia-ml.list - kind: RUN value: apt update && apt install --no-install-recommends -y git curl - kind: RUN value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git - kind: RUN value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5 - kind: RUN value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image Pillow
Unfortunately, new steps did not appear in the presented log. I am wondering whether some compy of this file is cached somewhere in docker and this is why new commands are not seen, or maybe a different file is used, or even maybe my commands are wrong and therefore not executed? Whichever scenraio it is, I have decided to ask for help here.
This also would be equivalent to solution of this issue.
The matter is very important and urgent. I have many people simultaneously doing heavy computations in that docker on CPU instead of GPU just because of this failure.
Have you solved it yet? I googled everywhere and this is the only issue I found same with me
@belkahorry actually I refused to create a docker image on my own and preferred to wait for an update from nvidia
I modify serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml
not function-gpu.yaml
add apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub
that can build successfully
postCopy:
- kind: WORKDIR
value: /opt/nuclio
- kind: RUN
value: apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub && apt update && apt install --no-install-recommends -y git curl
- kind: RUN
value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
- kind: RUN
value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
- kind: RUN
value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image 'imageio<=2.9.0' Pillow
Thanks @brucefay1115! It works! My system information: python 3.7 ubuntu18.04 NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4
BTW, my first attempt was not working because of other active container, I have to peform 2 commands below then reattempt. docker-compose down docker ps -aq | xargs docker stop | xargs docker rm