DIGITS icon indicating copy to clipboard operation
DIGITS copied to clipboard

19.01-caffe container doesn't support Tesla K80

Open up-to-you opened this issue 6 years ago • 5 comments

Environment: OS: CentOS Driver: 410.79
CUDA Version: 10.0

Trivial Object Detection model training start results in: Caffe: Check failed: error == cudaSuccess (48 vs. 0) no kernel image is available for execution on the device

Some investigations in that area have shown the need to compile caffee module with specific compatibility option... (if I'm not mistaken). Anyhow, the fact of incompatibility is very frustrating, since we are talking about delivery in docker containers.

Are there any plans for NVIDIA software to support NVIDIA hardware ? Or are there any workarounds for now ?

up-to-you avatar Feb 25 '19 22:02 up-to-you

Same issue here!

The error message is exactly the same.

I have to use 18.08 due to CUDA setup. I'm running Centos on K40 card.

backtozero avatar Feb 26 '19 17:02 backtozero

Please, pay attention to this issue: https://github.com/NVIDIA/DIGITS/issues/1863

It seems to be that all you need to do is to add 37 architecture to -DCUDA_ARCH_BIN for K80

I would gladly fix it myself, but i don't have any source of DockerFIle for latest nvidia-docker images.

up-to-you avatar Mar 01 '19 20:03 up-to-you

Prebuilt ngc container images target pascal and up for gpu arch, you will need to manually build your own docker environment. see the changelog here: https://docs.nvidia.com/deeplearning/digits/digits-release-notes/rel_19-01.html#rel_19-01

NevesLucas avatar Apr 01 '19 20:04 NevesLucas

I managed to get it working with 18.08 image.

sudo docker pull nvcr.io/nvidia/digits:18.08

and then

nvidia-docker run --name digits -d -p 80:5000 \ -v /home/$USER/data/myFolder/training/images:/training-images \ -v /home/$USER/data/myFolder/training/labels:/training-labels \ -v /home/$USER/data/myFolder/validate/images:/validate-images \ -v /home/$USER/data/myFolder/validate/labels:/validate-labels nvcr.io/nvidia/digits:18.08

You can browse your host on port 80. Change port if needed.

HEBOS avatar Sep 10 '19 15:09 HEBOS

@HEBOS yep, while it is working with 18.08, the bug is reproduced for 19.01+ versions. I recall one of the speech from Linus https://youtu.be/iYWzMvlj2RQ?t=29

This repository is abandoned, no doubt

up-to-you avatar Sep 11 '19 13:09 up-to-you