gpushare-scheduler-extender
gpushare-scheduler-extender copied to clipboard
GPU-mem is the whole GB value, not MB value
Right now on a g3s.xlarge instance I'm seeing the gpu-mem value being set to 7 though the host has 1 GPU with 7GB of memory (7618MiB according to nvidia-smi).
If I try to schedule a fraction of gpu-mem (1.5 for example) I'm told I need to use a whole integer.
Should the plugin be exporting 7618 as the gpu-mem value?
Yes,if you want to use 7618MiB, you should change the unit into MiB
in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.
Yes,if you want to use 7618MiB, you should change the unit into
MiB
in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.
- I change the unit into MiB, and recreate device-plugin-ds,find node kubelet.service report grpc error。
Mar 19 06:58:31 k8s-node-1 kubelet[12836]: E0319 06:58:31.266996 12836 endpoint.go:106] listAndWatch ended unexpectedly for device plugin aliyun.com/gpu-mem with error rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5862768 vs. 4194304)
Mar 19 06:58:31 k8s-node-1 kubelet[12836]: I0319 06:58:31.267070 12836 manager.go:430] Mark all resources Unhealthy for resource aliyun.com/gpu-mem
- And gpushare device pluging pod logs as blow
I0319 13:58:30.902668 1 main.go:18] Start gpushare device plugin
I0319 13:58:30.902780 1 gpumanager.go:28] Loading NVML
I0319 13:58:30.908589 1 gpumanager.go:37] Fetching devices.
I0319 13:58:30.908639 1 gpumanager.go:43] Starting FS watcher.
I0319 13:58:30.908785 1 gpumanager.go:51] Starting OS watcher.
I0319 13:58:30.924544 1 nvidia.go:64] Deivce GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75's Path is /dev/nvidia0
I0319 13:58:30.924630 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:30.924649 1 nvidia.go:40] set gpu memory: 12196
I0319 13:58:30.924659 1 nvidia.go:76] # Add first device ID: GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75-_-0
I0319 13:58:30.935332 1 nvidia.go:79] # Add last device ID: GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75-_-12195
I0319 13:58:30.950346 1 nvidia.go:64] Deivce GPU-a12a3921-ea32-1160-c3b0-394b977ffc84's Path is /dev/nvidia1
I0319 13:58:30.950378 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:30.950388 1 nvidia.go:76] # Add first device ID: GPU-a12a3921-ea32-1160-c3b0-394b977ffc84-_-0
I0319 13:58:30.959102 1 nvidia.go:79] # Add last device ID: GPU-a12a3921-ea32-1160-c3b0-394b977ffc84-_-12195
I0319 13:58:30.985063 1 nvidia.go:64] Deivce GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181's Path is /dev/nvidia2
I0319 13:58:30.985110 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:30.985119 1 nvidia.go:76] # Add first device ID: GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181-_-0
I0319 13:58:30.995293 1 nvidia.go:79] # Add last device ID: GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181-_-12195
I0319 13:58:31.047900 1 nvidia.go:64] Deivce GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c's Path is /dev/nvidia3
I0319 13:58:31.047935 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.047946 1 nvidia.go:76] # Add first device ID: GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c-_-0
I0319 13:58:31.054558 1 nvidia.go:79] # Add last device ID: GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c-_-12195
I0319 13:58:31.087392 1 nvidia.go:64] Deivce GPU-c9d55403-db94-541a-098e-aa1a4fac438c's Path is /dev/nvidia4
I0319 13:58:31.087415 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.087423 1 nvidia.go:76] # Add first device ID: GPU-c9d55403-db94-541a-098e-aa1a4fac438c-_-0
I0319 13:58:31.093386 1 nvidia.go:79] # Add last device ID: GPU-c9d55403-db94-541a-098e-aa1a4fac438c-_-12195
I0319 13:58:31.124518 1 nvidia.go:64] Deivce GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579's Path is /dev/nvidia5
I0319 13:58:31.124535 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.124541 1 nvidia.go:76] # Add first device ID: GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579-_-0
I0319 13:58:31.134973 1 nvidia.go:79] # Add last device ID: GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579-_-12195
I0319 13:58:31.171276 1 nvidia.go:64] Deivce GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363's Path is /dev/nvidia6
I0319 13:58:31.171312 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.171323 1 nvidia.go:76] # Add first device ID: GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363-_-0
I0319 13:58:31.179836 1 nvidia.go:79] # Add last device ID: GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363-_-12195
I0319 13:58:31.215859 1 nvidia.go:64] Deivce GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750's Path is /dev/nvidia7
I0319 13:58:31.215904 1 nvidia.go:69] # device Memory: 12196
I0319 13:58:31.215916 1 nvidia.go:76] # Add first device ID: GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750-_-0
I0319 13:58:31.223627 1 nvidia.go:79] # Add last device ID: GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750-_-12195
I0319 13:58:31.223647 1 server.go:43] Device Map: map[GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75:0 GPU-a12a3921-ea32-1160-c3b0-394b977ffc84:1 GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181:2 GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c:3 GPU-c9d55403-db94-541a-098e-aa1a4fac438c:4 GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579:5 GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363:6 GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750:7]
I0319 13:58:31.223707 1 server.go:44] Device List: [GPU-d5ac7a2c-c032-3f23-6244-2fc08f8aa363 GPU-0dd2b0c3-3f55-5872-3e17-d6b889e77750 GPU-bda0bcfa-022d-e4a5-ecb7-0ca863a47e75 GPU-a12a3921-ea32-1160-c3b0-394b977ffc84 GPU-4f7ecd0f-69ca-45ab-558e-f0d798c8d181 GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c GPU-c9d55403-db94-541a-098e-aa1a4fac438c GPU-6c5d0cb4-ab2c-3eb8-5c1f-531d39d11579]
I0319 13:58:31.248160 1 podmanager.go:68] No need to update Capacity aliyun.com/gpu-count
I0319 13:58:31.249329 1 server.go:222] Starting to serve on /var/lib/kubelet/device-plugins/aliyungpushare.sock
I0319 13:58:31.250685 1 server.go:230] Registered device plugin with Kubelet
- mine nvidia-smi print physical machine, i think multi card used MiB unit, cause grpc stream data struct overflow。
Tue Mar 19 07:09:10 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) On | 00000000:04:00.0 Off | N/A |
| 23% 30C P8 7W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) On | 00000000:05:00.0 Off | N/A |
| 23% 29C P8 7W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 TITAN X (Pascal) On | 00000000:08:00.0 Off | N/A |
| 23% 26C P8 7W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 TITAN X (Pascal) On | 00000000:09:00.0 Off | N/A |
| 23% 24C P8 9W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 TITAN X (Pascal) On | 00000000:84:00.0 Off | N/A |
| 23% 28C P8 9W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 TITAN X (Pascal) On | 00000000:85:00.0 Off | N/A |
| 23% 31C P8 7W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 TITAN X (Pascal) On | 00000000:88:00.0 Off | N/A |
| 23% 23C P8 7W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 TITAN X (Pascal) On | 00000000:89:00.0 Off | N/A |
| 23% 24C P8 8W / 250W | 1MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I think it's due to grpc max msg size. If you'd like to fix, it should be similar to https://github.com/helm/helm/pull/3514.
I think it's due to grpc max msg size. If you'd like to fix, it should be similar to helm/helm#3514.
it can't fix mine problem. i review gpushare-device-plugin proj code, https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/pkg/gpu/nvidia/nvidia.go#L82, find fakeID mini size = len(GPU-17f59c6f-0e44-f0d8-675f-30833e525c5c-_-0) * 12195 * 8 > 4194304 overflow grpc library default is 4MB。
I mean you can increase the default grpc max msg size in source code of Kubelet and device plugin to 16MB, and compile them to new binary then deploy. I think it can work. Otherwise, you use use GiB as memory unit.
I mean you can increase the default grpc max msg size in source code of Kubelet and device plugin to 16MB, and compile them to new binary then deploy. I think it can work. Otherwise, you use use GiB as memory unit.
Thanks, I agree with the solution. It is recommended that this case be added to the User Guide
Thank you for your suggestions. Would you like to help?
In that case, it added almost 100,000 device IDs (object+string) just for that machine. It's a big waste of CPU and memory and risks causing crashes in the kubelet. This is an example of gRPC limits being helpful.
Rather than messing with gRPC and building custom plugins and custom kubelets, you could and should just use a different unit. Something like 64MB, 100MB or 128MB is a reasonable compromise. Having to round up numbers also prevents you from packing things perfectly, which is perhaps a good idea if your pods will compete a lot for the same GPU.
- 重新编译kubelet源码,替换原有kubelet。修改代码的地方是https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/devicemanager/endpoint.go 中,
dial方法添加参数grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(1024*1024*16))
- 效果:
[root@jenkins ~]# kubectl inspect gpushare
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU4(Allocated/Total) GPU5(Allocated/Total) GPU6(Allocated/Total) GPU7(Allocated/Total) GPU Memory(MiB)
192.168.68.13 192.168.68.13 0/12066 0/12066 0/12066 0/12066 0/12066 0/12066 0/12066 0/12066 0/96528
192.168.68.5 192.168.68.5 0/11178 0/11178 0/11178 0/11178 0/11178 0/11178 0/11178 0/11178 0/89424
------------------------------------------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
0/185952 (0%)
@cheyang @therc
Hi, can you tell me how to set the unit to 128MiB?
I've checked the code, the --memory-unit
only accepts MiB
or GiB
.
If I set to 128MiB, the unit will fall back to GiB
Yes,if you want to use 7618MiB, you should change the unit into
MiB
in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.
Thanks, worked for me :+1: .
Yes,if you want to use 7618MiB, you should change the unit into
MiB
in https://github.com/AliyunContainerService/gpushare-device-plugin/blob/master/device-plugin-ds.yaml#L28.Thanks, worked for me 👍 .
not work,--memory-unit
set MiB,but aliyun.com/gpu-mem
still use Gib。
my case is that if I set MiB,use commad"kubectl inspect gpushare" display GPU with MiB unit,but when I apply for gpu in pod, it remand me: 0/3 nodes are available: 3 Insufficient GPU Memory in one device.