gpushare-scheduler-extender icon indicating copy to clipboard operation
gpushare-scheduler-extender copied to clipboard

GPU-mem is the whole GB value, not MB value

Open reverson opened this issue 5 years ago • 13 comments

Right now on a g3s.xlarge instance I'm seeing the gpu-mem value being set to 7 though the host has 1 GPU with 7GB of memory (7618MiB according to nvidia-smi).

If I try to schedule a fraction of gpu-mem (1.5 for example) I'm told I need to use a whole integer.

Should the plugin be exporting 7618 as the gpu-mem value?

reverson avatar Mar 15 '19 18:03 reverson

Yes,if you want to use 7618MiB, you should change the unit into MiB in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.

cheyang avatar Mar 16 '19 03:03 cheyang

Yes,if you want to use 7618MiB, you should change the unit into MiB in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.

  • I change the unit into MiB, and recreate device-plugin-ds,find node kubelet.service report grpc error。
Mar 19 06:58:31 k8s-node-1 kubelet[12836]: E0319 06:58:31.266996   12836 endpoint.go:106] listAndWatch ended unexpectedly for device plugin aliyun.com/gpu-mem with error rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5862768 vs. 4194304)
Mar 19 06:58:31 k8s-node-1 kubelet[12836]: I0319 06:58:31.267070   12836 manager.go:430] Mark all resources Unhealthy for resource aliyun.com/gpu-mem
  • And gpushare device pluging pod logs as blow
I0319 13:58:30.902668       1 main.go:18] Start gpushare device plugin
I0319 13:58:30.902780       1 gpumanager.go:28] Loading NVML
I0319 13:58:30.908589       1 gpumanager.go:37] Fetching devices.
I0319 13:58:30.908639       1 gpumanager.go:43] Starting FS watcher.
I0319 13:58:30.908785       1 gpumanager.go:51] Starting OS watcher.
I0319 13:58:30.924544       1 nvidia.go:64] Deivce GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75's Path is /dev/nvidia0
I0319 13:58:30.924630       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:30.924649       1 nvidia.go:40] set gpu memory: 12196
I0319 13:58:30.924659       1 nvidia.go:76] # Add first device ID: GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75-_-0
I0319 13:58:30.935332       1 nvidia.go:79] # Add last device ID: GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75-_-12195
I0319 13:58:30.950346       1 nvidia.go:64] Deivce GPU-a12a3921-ea32-1160-c3b0-394b977ffc84's Path is /dev/nvidia1
I0319 13:58:30.950378       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:30.950388       1 nvidia.go:76] # Add first device ID: GPU-a12a3921-ea32-1160-c3b0-394b977ffc84-_-0
I0319 13:58:30.959102       1 nvidia.go:79] # Add last device ID: GPU-a12a3921-ea32-1160-c3b0-394b977ffc84-_-12195
I0319 13:58:30.985063       1 nvidia.go:64] Deivce GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181's Path is /dev/nvidia2
I0319 13:58:30.985110       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:30.985119       1 nvidia.go:76] # Add first device ID: GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181-_-0
I0319 13:58:30.995293       1 nvidia.go:79] # Add last device ID: GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181-_-12195
I0319 13:58:31.047900       1 nvidia.go:64] Deivce GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c's Path is /dev/nvidia3
I0319 13:58:31.047935       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.047946       1 nvidia.go:76] # Add first device ID: GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c-_-0
I0319 13:58:31.054558       1 nvidia.go:79] # Add last device ID: GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c-_-12195
I0319 13:58:31.087392       1 nvidia.go:64] Deivce GPU-c9d55403-db94-541a-098e-aa1a4fac438c's Path is /dev/nvidia4
I0319 13:58:31.087415       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.087423       1 nvidia.go:76] # Add first device ID: GPU-c9d55403-db94-541a-098e-aa1a4fac438c-_-0
I0319 13:58:31.093386       1 nvidia.go:79] # Add last device ID: GPU-c9d55403-db94-541a-098e-aa1a4fac438c-_-12195
I0319 13:58:31.124518       1 nvidia.go:64] Deivce GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579's Path is /dev/nvidia5
I0319 13:58:31.124535       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.124541       1 nvidia.go:76] # Add first device ID: GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579-_-0
I0319 13:58:31.134973       1 nvidia.go:79] # Add last device ID: GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579-_-12195
I0319 13:58:31.171276       1 nvidia.go:64] Deivce GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363's Path is /dev/nvidia6
I0319 13:58:31.171312       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.171323       1 nvidia.go:76] # Add first device ID: GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363-_-0
I0319 13:58:31.179836       1 nvidia.go:79] # Add last device ID: GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363-_-12195
I0319 13:58:31.215859       1 nvidia.go:64] Deivce GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750's Path is /dev/nvidia7
I0319 13:58:31.215904       1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.215916       1 nvidia.go:76] # Add first device ID: GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750-_-0
I0319 13:58:31.223627       1 nvidia.go:79] # Add last device ID: GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750-_-12195
I0319 13:58:31.223647       1 server.go:43] Device Map: map[GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75:0 GPU-a12a3921-ea32-1160-c3b0-394b977ffc84:1 GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181:2 GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c:3 GPU-c9d55403-db94-541a-098e-aa1a4fac438c:4 GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579:5 GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363:6 GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750:7]
I0319 13:58:31.223707       1 server.go:44] Device List: [GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363 GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750 GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75 GPU-a12a3921-ea32-1160-c3b0-394b977ffc84 GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181 GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c GPU-c9d55403-db94-541a-098e-aa1a4fac438c GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579]
I0319 13:58:31.248160       1 podmanager.go:68] No need to update Capacity aliyun.com/gpu-count
I0319 13:58:31.249329       1 server.go:222] Starting to serve on /var/lib/kubelet/device-plugins/aliyungpushare.sock
I0319 13:58:31.250685       1 server.go:230] Registered device plugin with Kubelet
  • mine nvidia-smi print physical machine, i think multi card used MiB unit, cause grpc stream data struct overflow。
Tue Mar 19 07:09:10 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    On   | 00000000:04:00.0 Off |                  N/A |
| 23%   30C    P8     7W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    On   | 00000000:05:00.0 Off |                  N/A |
| 23%   29C    P8     7W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN X (Pascal)    On   | 00000000:08:00.0 Off |                  N/A |
| 23%   26C    P8     7W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN X (Pascal)    On   | 00000000:09:00.0 Off |                  N/A |
| 23%   24C    P8     9W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN X (Pascal)    On   | 00000000:84:00.0 Off |                  N/A |
| 23%   28C    P8     9W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN X (Pascal)    On   | 00000000:85:00.0 Off |                  N/A |
| 23%   31C    P8     7W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN X (Pascal)    On   | 00000000:88:00.0 Off |                  N/A |
| 23%   23C    P8     7W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN X (Pascal)    On   | 00000000:89:00.0 Off |                  N/A |
| 23%   24C    P8     8W / 250W |      1MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

guunergooner avatar Mar 19 '19 14:03 guunergooner

I think it's due to grpc max msg size. If you'd like to fix, it should be similar to https://github.com/helm/helm/pull/3514.

cheyang avatar Mar 19 '19 16:03 cheyang

I think it's due to grpc max msg size. If you'd like to fix, it should be similar to helm/helm#3514.

it can't fix mine problem. i review gpushare-device-plugin proj code, https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/pkg/gpu/nvidia/nvidia.go#L82, find fakeID mini size = len(GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c-_-0) * 12195 * 8 > 4194304 overflow grpc library default is 4MB。

guunergooner avatar Mar 20 '19 04:03 guunergooner

I mean you can increase the default grpc max msg size in source code of Kubelet and device plugin to 16MB, and compile them to new binary then deploy. I think it can work. Otherwise, you use use GiB as memory unit.

cheyang avatar Mar 20 '19 23:03 cheyang

I mean you can increase the default grpc max msg size in source code of Kubelet and device plugin to 16MB, and compile them to new binary then deploy. I think it can work. Otherwise, you use use GiB as memory unit.

Thanks, I agree with the solution. It is recommended that this case be added to the User Guide

guunergooner avatar Mar 21 '19 03:03 guunergooner

Thank you for your suggestions. Would you like to help?

cheyang avatar Mar 23 '19 07:03 cheyang

In that case, it added almost 100,000 device IDs (object+string) just for that machine. It's a big waste of CPU and memory and risks causing crashes in the kubelet. This is an example of gRPC limits being helpful.

Rather than messing with gRPC and building custom plugins and custom kubelets, you could and should just use a different unit. Something like 64MB, 100MB or 128MB is a reasonable compromise. Having to round up numbers also prevents you from packing things perfectly, which is perhaps a good idea if your pods will compete a lot for the same GPU.

therc avatar May 07 '19 00:05 therc

  • 重新编译kubelet源码,替换原有kubelet。修改代码的地方是https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/devicemanager/endpoint.go 中,

dial方法添加参数grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(1024*1024*16))

  • 效果:
[root@jenkins ~]# kubectl inspect gpushare
NAME           IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU4(Allocated/Total)  GPU5(Allocated/Total)  GPU6(Allocated/Total)  GPU7(Allocated/Total)  GPU Memory(MiB)
192.168.68.13  192.168.68.13  0/12066                0/12066                0/12066                0/12066                0/12066                0/12066                0/12066                0/12066                0/96528
192.168.68.5   192.168.68.5   0/11178                0/11178                0/11178                0/11178                0/11178                0/11178                0/11178                0/11178                0/89424
------------------------------------------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
0/185952 (0%) 

zlingqu avatar Apr 06 '21 01:04 zlingqu

@cheyang @therc Hi, can you tell me how to set the unit to 128MiB? I've checked the code, the --memory-unit only accepts MiB or GiB. If I set to 128MiB, the unit will fall back to GiB

joy717 avatar Oct 12 '21 08:10 joy717

Yes,if you want to use 7618MiB, you should change the unit into MiB in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.

Thanks, worked for me :+1: .

debMan avatar Nov 01 '21 14:11 debMan

Yes,if you want to use 7618MiB, you should change the unit into MiB in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.

Thanks, worked for me 👍 .

not work,--memory-unit set MiB,but aliyun.com/gpu-mem still use Gib。

sloth2012 avatar Oct 28 '22 09:10 sloth2012

my case is that if I set MiB,use commad"kubectl inspect gpushare" display GPU with MiB unit,but when I apply for gpu in pod, it remand me: 0/3 nodes are available: 3 Insufficient GPU Memory in one device.

harrymore avatar Nov 10 '23 06:11 harrymore