gpu-manager icon indicating copy to clipboard operation
gpu-manager copied to clipboard

pod状态UnexpectedAdmissionError

Open ledrsnet opened this issue 1 year ago • 1 comments

pod的yaml如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: vcuda-test
    qcloud-app: vcuda-test
  name: vcuda-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: vcuda-test
  template:
    metadata:
      labels:
        k8s-app: vcuda-test
        qcloud-app: vcuda-test
    spec:
      containers:
      - command:
        - sleep
        - 360000s
        env:
        - name: PATH
          value: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        image: ccr.ccs.tencentyun.com/menghe/tensorflow-gputest:0.2
        imagePullPolicy: IfNotPresent
        name: tensorflow-test
        resources:
          limits:
            cpu: "1"
            memory: 8Gi
            tencent.com/vcuda-core: "50"
          #  tencent.com/vcuda-memory: "32"
          requests:
            cpu: "1"
            memory: 8Gi
            tencent.com/vcuda-core: "50"
           # tencent.com/vcuda-memory: "32"

当我注释掉vcuda-memory 这一行时,我apply应用pod,pod状态为UnexpectedAdmissionError,describe pod得到如下信息:

Reason:         UnexpectedAdmissionError
Message:        Pod Allocate failed due to rpc error: code = Unknown desc = candidate pod not found for request &AllocateRequest{ContainerRequests:[&ContainerAllocateRequest{DevicesIDs:[tencent.com/vcuda-core-67 tencent.com/vcuda-core-198 tencent.com/vcuda-core-41 tencent.com/vcuda-core-60 tencent.com/vcuda-core-1 tencent.com/vcuda-core-94 tencent.com/vcuda-core-180 tencent.com/vcuda-core-132 tencent.com/vcuda-core-126 tencent.com/vcuda-core-152 tencent.com/vcuda-core-165 tencent.com/vcuda-core-101 tencent.com/vcuda-core-169 tencent.com/vcuda-core-183 tencent.com/vcuda-core-50 tencent.com/vcuda-core-159 tencent.com/vcuda-core-19 tencent.com/vcuda-core-113 tencent.com/vcuda-core-184 tencent.com/vcuda-core-64 tencent.com/vcuda-core-56 tencent.com/vcuda-core-195 tencent.com/vcuda-core-109 tencent.com/vcuda-core-193 tencent.com/vcuda-core-71 tencent.com/vcuda-core-37 tencent.com/vcuda-core-142 tencent.com/vcuda-core-123 tencent.com/vcuda-core-122 tencent.com/vcuda-core-4 tencent.com/vcuda-core-86 tencent.com/vcuda-core-168 tencent.com/vcuda-core-59 tencent.com/vcuda-core-93 tencent.com/vcuda-core-166 tencent.com/vcuda-core-128 tencent.com/vcuda-core-145 tencent.com/vcuda-core-53 tencent.com/vcuda-core-102 tencent.com/vcuda-core-12 tencent.com/vcuda-core-173 tencent.com/vcuda-core-30 tencent.com/vcuda-core-90 tencent.com/vcuda-core-0 tencent.com/vcuda-core-117 tencent.com/vcuda-core-105 tencent.com/vcuda-core-108 tencent.com/vcuda-core-148 tencent.com/vcuda-core-172 tencent.com/vcuda-core-161],}],}, allocation failed, which is unexpected

ledrsnet avatar Mar 17 '23 02:03 ledrsnet