gpu-manager icon indicating copy to clipboard operation
gpu-manager copied to clipboard

use gpu-manager in cuda drvier11.6 , Function Not Found in Memory-Usage when use nvidia-smi in container

Open WindyLQL opened this issue 3 years ago • 2 comments

Problem

Using gpu-manager on cuda 11.6 version, its memory interface lacks function hijacking ?

Describes: FB Memory Usage Total : Function Not Found Reserved : Function Not Found Used : Function Not Found Free : Function Not Found

Execute the nvidia-smi command in the container, the display is as follows

xx:/# kubectl exec vcuda nvidia-smi

Mon May 30 03:24:18 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10          Off  | 00000000:1A:00.0 Off |                    0 |
|  0%   50C    P0    58W / 150W | Function Not Found   |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    648808      C                                    1367MiB |
+-----------------------------------------------------------------------------+

The current gpu-manager supports cuda driver 11.5.1, Any one can support the cuda driver 11.6 ?

WindyLQL avatar May 30 '22 03:05 WindyLQL

not yet, This issue was circumvented by downgrading the GPU driver to 11.3 Hope the community will support this driver version soon

------------------ 原始邮件 ------------------ 发件人: "tkestack/gpu-manager" @.>; 发送时间: 2022年6月28日(星期二) 中午11:33 @.>; @.@.>; 主题: Re: [tkestack/gpu-manager] use gpu-manager in cuda drvier11.6 , Function Not Found in Memory-Usage when use nvidia-smi in container (Issue #159)

@WindyLQL Hi, i got the same problem, did you solve the problem?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

WindyLQL avatar Jun 29 '22 13:06 WindyLQL

Adding nvmlDeviceGetMemoryInfo_v2 function in the vcuda_controller can solve this problem.

seanchen022 avatar Oct 09 '22 04:10 seanchen022