HAMi icon indicating copy to clipboard operation
HAMi copied to clipboard

K8S指定节点就报错

Open sppliyong opened this issue 2 years ago • 6 comments

说明:你好,我使用官网的例子,当加上nodeName:ip 这一行时候,pod就会起不来,去除这一行,那么pod就会起来,使用的是k8s1.16.8的版本。这是什么原因?

apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: nodeName: master containers: - name: ubuntu-container image: ubuntu:18.04 command: ["bash", "-c", "sleep 1000"] resources: limits: nvidia.com/gpu: 1 # requesting 2 vGPUs nvidia.com/gpumem: 3000 # Each vGPU contains 3000m device memory (Optional,Integer) nvidia.com/gpucores: 60 # Each vGPU uses 30% of the entire GPU (Optional,Integer)

sppliyong avatar Oct 26 '22 06:10 sppliyong

你好,请问你指定的这个节点是已经打过label(”gpu=on“)的包含GPU的节点吗

archlitchi avatar Oct 26 '22 07:10 archlitchi

是的 不加这一行直接在master上可以跑

sppliyong avatar Oct 26 '22 07:10 sppliyong

是我k8s版本的问题吗?这个你们在1.16.8版本验证过吗?

sppliyong avatar Oct 26 '22 09:10 sppliyong

应该不是,是目前的调度策略的问题,争取在下个版本fix掉

archlitchi avatar Oct 26 '22 10:10 archlitchi

调度策略这边确实有问题,kubectl label nodes accelerator=xxx 这种key value 的打标签的方式是可以到主节点上的,但是键必须是accelerator,但是主节点调度不了到子节点。一指定子节点的标签,显示 1 node unregisterd, 。我用web nginx测试的时候,是可以指定,节点都高可用,但是到vgpu这块,就有这系列问题了。

sppliyong avatar Oct 27 '22 02:10 sppliyong

补充一下哈:我这在主节点上kubectl get nodes 是能看到子节点的

sppliyong avatar Oct 27 '22 02:10 sppliyong

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Apr 01 '24 20:04 github-actions[bot]

This issue has not seen any activity since it was marked stale. Closing.

github-actions[bot] avatar Apr 16 '24 20:04 github-actions[bot]