devices icon indicating copy to clipboard operation
devices copied to clipboard

vgpu 并发调度pod时,显存混乱

Open singeleaf opened this issue 10 months ago • 0 comments

执行下面的命令,同时调度2个pod,一个分配24576M显存,一个分配600M显存,pod起来后进入容器使用nvidia-smi查看,发现两者的显存是反的,给容器ubuntu-container-24576分配了600M显存,给容器ubuntu-container-600分配了24576显存

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod-1v-24576-1
spec:
  schedulerName: volcano
  containers:
    - name: ubuntu-container-24576
      image: ubuntu:18.04
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          volcano.sh/vgpu-number: 1 # requesting 1 vGPUs
          volcano.sh/vgpu-memory: 24576
---
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod-1v-600-1
spec:
  schedulerName: volcano
  containers:
    - name: ubuntu-container-600
      image: ubuntu:18.04
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          volcano.sh/vgpu-number: 1 # requesting 1 vGPUs
          volcano.sh/vgpu-memory: 600
EOF

singeleaf avatar Mar 29 '24 15:03 singeleaf