DIGITS
DIGITS copied to clipboard
19.01-caffe container doesn't support Tesla K80
Environment:
OS: CentOS
Driver: 410.79
CUDA Version: 10.0
Trivial Object Detection model training start results in:
Caffe: Check failed: error == cudaSuccess (48 vs. 0) no kernel image is available for execution on the device
Some investigations in that area have shown the need to compile
caffee module with specific compatibility option... (if I'm not mistaken). Anyhow, the fact of incompatibility is very frustrating, since we are talking about delivery in docker containers.
Are there any plans for NVIDIA software to support NVIDIA hardware ? Or are there any workarounds for now ?
Same issue here!
The error message is exactly the same.
I have to use 18.08 due to CUDA setup. I'm running Centos on K40 card.
Please, pay attention to this issue: https://github.com/NVIDIA/DIGITS/issues/1863
It seems to be that all you need to do is to add 37
architecture to -DCUDA_ARCH_BIN
for K80
I would gladly fix it myself, but i don't have any source of DockerFIle for latest nvidia-docker images.
Prebuilt ngc container images target pascal and up for gpu arch, you will need to manually build your own docker environment. see the changelog here: https://docs.nvidia.com/deeplearning/digits/digits-release-notes/rel_19-01.html#rel_19-01
I managed to get it working with 18.08 image.
sudo docker pull nvcr.io/nvidia/digits:18.08
and then
nvidia-docker run --name digits -d -p 80:5000 \ -v /home/$USER/data/myFolder/training/images:/training-images \ -v /home/$USER/data/myFolder/training/labels:/training-labels \ -v /home/$USER/data/myFolder/validate/images:/validate-images \ -v /home/$USER/data/myFolder/validate/labels:/validate-labels nvcr.io/nvidia/digits:18.08
You can browse your host on port 80. Change port if needed.
@HEBOS yep, while it is working with 18.08, the bug is reproduced for 19.01+ versions. I recall one of the speech from Linus https://youtu.be/iYWzMvlj2RQ?t=29
This repository is abandoned, no doubt