gpushare-scheduler-extender
gpushare-scheduler-extender copied to clipboard
分配8G 但是实际使用不会限制在8G内
我在自己的服务器上部署了k8s 单节点,然后按照教程安装配置了 阿里gpu 插件 ,安装部署kubeflow ,在里面创建notebook 配置 8G 显存,但是在实际使用中还是会占用到一张显卡的几乎全部内存。还是是说在没有其他任务的情况下就会直接占满使用?
nvidia-smi 查看信息
root@chnlj-Super-Server:~# nvidia-smi
Wed Aug 25 14:20:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.84 Driver Version: 460.84 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:02:00.0 Off | N/A |
| 13% 44C P2 60W / 257W | 10571MiB / 11019MiB | 10% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A |
| 13% 32C P8 2W / 257W | 0MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:82:00.0 Off | N/A |
| 13% 31C P8 21W / 257W | 0MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 9750 C /opt/conda/bin/python3 471MiB |
| 0 N/A N/A 27519 C /opt/conda/bin/python3 10097MiB |
+-----------------------------------------------------------------------------+
kubectl-inspect-gpushare 查看信息
root@chnlj-Super-Server:/kubeflowData/pv/pv5# kubectl-inspect-gpushare
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU Memory(GiB)
chnlj-super-server 172.16.15.34 8/10 0/10 0/10 8/30
------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
8/30 (26%)
kubectl describe pod luk-test-gpu-notebook-0 -nkubeflow-user-example-com
root@chnlj-Super-Server:~# kubectl describe pod luk-test-gpu-notebook-0 -nkubeflow-user-example-com
Name: luk-test-gpu-notebook-0
Namespace: kubeflow-user-example-com
Priority: 0
Node: chnlj-super-server/172.16.15.34
Start Time: Wed, 25 Aug 2021 14:02:27 +0800
Labels: app=luk-test-gpu-notebook
controller-revision-hash=luk-test-gpu-notebook-68745bcf4c
istio.io/rev=default
notebook-name=luk-test-gpu-notebook
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=luk-test-gpu-notebook
service.istio.io/canonical-revision=latest
statefulset=luk-test-gpu-notebook
statefulset.kubernetes.io/pod-name=luk-test-gpu-notebook-0
Annotations: ALIYUN_COM_GPU_MEM_ASSIGNED: true
ALIYUN_COM_GPU_MEM_ASSUME_TIME: 1629871347742396836
ALIYUN_COM_GPU_MEM_DEV: 10
ALIYUN_COM_GPU_MEM_IDX: 0
ALIYUN_COM_GPU_MEM_POD: 8
kubectl.kubernetes.io/default-logs-container: luk-test-gpu-notebook
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istiod-ca-cert"],"ima...
Status: Running
IP: 10.244.0.168
IPs:
IP: 10.244.0.168
Controlled By: StatefulSet/luk-test-gpu-notebook
Init Containers:
istio-init:
Container ID: docker://fb5d0fdf3b74d116decffc03b0949a3df426dadfca4b7deefd76eaaffd7916a2
Image: docker.io/istio/proxyv2:1.9.0
Image ID: docker-pullable://istio/proxyv2@sha256:286b821197d7a9233d1d889119f090cd9a9394468d3a312f66ea24f6e16b2294
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 25 Aug 2021 14:02:32 +0800
Finished: Wed, 25 Aug 2021 14:02:33 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-editor-token-7h5c7 (ro)
Containers:
luk-test-gpu-notebook:
Container ID: docker://459637e1f4fdd52a06840dcd09aa19c4ed662c9a4bb005673a044b0d0b7cc948
Image: public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow-cuda-full:v1.3.0-rc.0
Image ID: docker-pullable://public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow-cuda-full@sha256:4b3f2dbf8fca0de3451a98d628700e4249e2a21ccb52db1853d4a2904e31e9a2
Port: 8888/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 25 Aug 2021 14:02:34 +0800
Ready: True
Restart Count: 0
Limits:
aliyun.com/gpu-mem: 8
Requests:
aliyun.com/gpu-mem: 8
cpu: 1
memory: 8Gi
Environment:
NB_PREFIX: /notebook/kubeflow-user-example-com/luk-test-gpu-notebook
Mounts:
/dev/shm from dshm (rw)
/home/jovyan from luk-test-gpu-notebook-ws (rw)
/home/jovyan/kubeflow-data from kubeflow-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-editor-token-7h5c7 (ro)
istio-proxy:
Container ID: docker://9e5e4a6495b3ab40a9b23e9de0b7a375f3e8c4539f6ea03501c96a5204961018
Image: docker.io/istio/proxyv2:1.9.0
Image ID: docker-pullable://istio/proxyv2@sha256:286b821197d7a9233d1d889119f090cd9a9394468d3a312f66ea24f6e16b2294
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--serviceCluster
luk-test-gpu-notebook.$(POD_NAMESPACE)
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
--concurrency
2
State: Running
Started: Wed, 25 Aug 2021 14:02:38 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.istio-system.svc:15012
POD_NAME: luk-test-gpu-notebook-0 (v1:metadata.name)
POD_NAMESPACE: kubeflow-user-example-com (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
CANONICAL_SERVICE: (v1:metadata.labels['service.istio.io/canonical-name'])
CANONICAL_REVISION: (v1:metadata.labels['service.istio.io/canonical-revision'])
PROXY_CONFIG: {}
ISTIO_META_POD_PORTS: [
{"name":"notebook-port","containerPort":8888,"protocol":"TCP"}
]
ISTIO_META_APP_CONTAINERS: luk-test-gpu-notebook
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_WORKLOAD_NAME: luk-test-gpu-notebook
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/kubeflow-user-example-com/statefulsets/luk-test-gpu-notebook
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-editor-token-7h5c7 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
limits.cpu -> cpu-limit
requests.cpu -> cpu-request
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
luk-test-gpu-notebook-ws:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: luk-test-gpu-notebook-ws
ReadOnly: false
kubeflow-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kubeflow-data
ReadOnly: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
default-editor-token-7h5c7:
Type: Secret (a volume populated by a Secret)
SecretName: default-editor-token-7h5c7
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 23m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 23m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 23m default-scheduler Successfully assigned kubeflow-user-example-com/luk-test-gpu-notebook-0 to chnlj-super-server
Normal Pulling 23m kubelet Pulling image "docker.io/istio/proxyv2:1.9.0"
Normal Pulled 22m kubelet Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 3.956645911s
Normal Created 22m kubelet Created container istio-init
Normal Started 22m kubelet Started container istio-init
Normal Pulled 22m kubelet Container image "public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-tensorflow-cuda-full:v1.3.0-rc.0" already present on machine
Normal Created 22m kubelet Created container luk-test-gpu-notebook
Normal Started 22m kubelet Started container luk-test-gpu-notebook
Normal Pulling 22m kubelet Pulling image "docker.io/istio/proxyv2:1.9.0"
Normal Pulled 22m kubelet Successfully pulled image "docker.io/istio/proxyv2:1.9.0" in 4.057881696s
Normal Created 22m kubelet Created container istio-proxy
Normal Started 22m kubelet Started container istio-proxy
nvidia-smi 显示的内存好像不是真实占用的。之前我也遇到这个问题了。TF2.0 默认情况下会申请所有的剩余的显存,但不一定会用到。后来设置了动态申请,用多少申请多少。这时候,如果真实使用的内存超过 gpushare 分配的内存,cgroup 会将进程杀掉。
nvidia-smi 显示的内存好像不是真实占用的。之前我也遇到这个问题了。TF2.0 默认情况下会申请所有的剩余的显存,但不一定会用到。后来设置了动态申请,用多少申请多少。这时候,如果真实使用的内存超过 gpushare 分配的内存,cgroup 会将进程杀掉。
按照官方示例上展示的是,gpushare分配 3G 然后 在 tf2.0 中使用 nvidia-smi 看到的就是 3G 然后使用也不会超过 这个 3G,然后在宿主机上查看 (假设宿主机是10G) 应该也是只占用 3G 总的 10G
nvidia-smi 显示的内存好像不是真实占用的。之前我也遇到这个问题了。TF2.0 默认情况下会申请所有的剩余的显存,但不一定会用到。后来设置了动态申请,用多少申请多少。这时候,如果真实使用的内存超过 gpushare 分配的内存,cgroup 会将进程杀掉。
参考这个实例 运行GPU共享实例
gpushare scheduler负责按照显存维度为单位,在集群中去调度作业,也就是找到哪个node上的哪块GPU卡还能提供作业所需显存大小。作业pod被调度到node上,会绑定合适的GPU卡到容器内。此时调度就完成了。 如果需要在容器内限制进程实际使用的显存量,还需要配合GPU隔离,这个就不在调度器的能力里了。 实现node上单GPU卡显存隔离的方案可以参考阿里云的cGPU,或Nivdia的MPS,或Nvidia A100的MIG等等
阿里云的cGPU
你好,那我想知道yaml文件中的配置文件起什么作用呢?
实际使用没有被限制住
gpushare scheduler负责按照显存维度为单位,在集群中去调度作业,也就是找到哪个node上的哪块GPU卡还能提供作业所需显存大小。作业pod被调度到node上,会绑定合适的GPU卡到容器内。此时调度就完成了。 如果需要在容器内限制进程实际使用的显存量,还需要配合GPU隔离,这个就不在调度器的能力里了。 实现node上单GPU卡显存隔离的方案可以参考阿里云的cGPU,或Nivdia的MPS,或Nvidia A100的MIG等等
阿里cGPU方案有开源吗?
已收到,谢谢!
已收到您的邮件,我将及时查看并回复,谢谢 王鑫
这个项目应该只是利用K8s的设备插件机制上报GPU资源,包括卡数和显存,再利用k8s的调度扩展机制自定义个调度器调度到某个node节点的某个卡上而已。至于限制那是类似cGroups机制实现。应该是kernel+GPu Driver层面去优雅的实现。或者用户态的CUDA劫持去实现。个人还是倾向于前者。或者Nvidia的MIG方案。但MIG支持显卡种类有限。