DlibDotNet icon indicating copy to clipboard operation
DlibDotNet copied to clipboard

Cannot run on Azure instance CUDA10.1+Tesla K80

Open jpsalada opened this issue 4 years ago • 6 comments

Summary of your issue

Hi,

I had success running DlibDotNet with a desktop computer, with a GeForce RTX 2080.

Now I am trying to move this into an Azure instance, that has a Tesla K80, however when I call dlib, namely the LossMetric operator on an image, the following error is output: InvalidDeviceFunction

For what I was able to assess from here https://en.wikipedia.org/wiki/CUDA, the compute capability of the K80 is supported by CUDA 10.1.

Has anyone been able to put DlibDotNet running on a Tesla K80?

Environment

Tesla K80 Windows Server 2019 CUDA 10.1 nvidea-smi output: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.96 Driver Version: 418.96 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 TCC | 00000001:00:00.0 Off | 0 | | N/A 37C P8 32W / 149W | 1MiB / 11448MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

DeviceQuery output: deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla K80" CUDA Driver Version / Runtime Version 10.1 / 10.1 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11448 MBytes (12004491264 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: zu bytes Total amount of shared memory per block: zu bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: zu bytes Texture alignment: zu bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled CUDA Device Driver Mode (TCC or WDDM): TCC (Tesla Compute Cluster Driver) Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 1 / 0 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1, Device0 = Tesla K80 Result = PASS

jpsalada avatar Sep 28 '20 16:09 jpsalada

@jpsalada I'm not familiar with Microsoft Azure. But I doubt your vm does not use full 1 gpu. Because Tesla K80 could have 24GB memory rather than 12GB.

https://docs.microsoft.com/ja-jp/azure/virtual-machines/nc-series Do you use Standard_NC6 instance?

This difference may occur your issue.

takuya-takeuchi avatar Sep 28 '20 16:09 takuya-takeuchi

Yes I am using exactly that instance. It uses half of the GPU. Do you think using half of the GPU could be the problem?

12GB is still a lot of memory, and a higher value than the one with the RTX 2080 (8GB), where I ran without any issue.

jpsalada avatar Sep 28 '20 16:09 jpsalada

@jpsalada I'm not sure half of the GPU could be the problem because I have never used this case. I guess Windows Server control dGPU by Discrete Device Assignment and it may affect.

But but I guess this issue is not occurred by DlibDotNet. DlibDotNet does not control cuda device. So we should check whether it is successful to build dlib C++ on Azure VM and it works fine or not.

12GB is still a lot of memory, and a higher value than the one with the RTX 2080 (8GB), where I ran without any issue.

This issue does not relate to memory size. I guess Operating system cannot give proper information about GPU to dlib. So you can check DlibDotNet on a full one gpu VM.

takuya-takeuchi avatar Sep 30 '20 15:09 takuya-takeuchi

@takuya-takeuchi I have tried another instance that uses half of the GPU Nvidea M60, which has a more recent computer capability than the K80, and it worked.

Do you think Windows 10 could not be impacted, by this apparent issue of the Nvidea K80 on Azure?

jpsalada avatar Oct 01 '20 08:10 jpsalada

@jpsalada TBH, I have no tesla K80. But nvidia driver web page recommends CUDA 7.0 and it does not restrict using on Windows 10. https://www.nvidia.co.jp/download/driverResults.aspx/88676/en

So it may work by using old CUDA version of DlibDotNet. But DlibDotNet with oldest CUDA is 9.2. Current dlib source code abandone to support old CUDA so we can not use legacy cuda 7.0.

takuya-takeuchi avatar Oct 01 '20 14:10 takuya-takeuchi

https://forums.developer.nvidia.com/t/nvidia-tesla-k80-cuda-version-support/67676/7

But it may not relate to CUDA version. But this thread assumes as working on on Linux.

takuya-takeuchi avatar Oct 01 '20 15:10 takuya-takeuchi