seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

[bug] update SeldonDeployment got `field is immutable`

Open guhan121 opened this issue 3 years ago • 4 comments

Describe the bug

we need update SeldonDeployment by api. when we make a old SeldonDeployment seldon-test-ali-1-cp-cp and add one model to it . use code like this update it.

	got, er := clientset.MachinelearningV1().SeldonDeployments(ns).
		Get(context.Background(), seldonDeployment.Name, metav1.GetOptions{})
	if got != nil && !k8sErrors.IsNotFound(er) {
		got.Spec = seldonDeployment.Spec
		got.ClusterName = seldonDeployment.ClusterName
		got.Annotations = seldonDeployment.Annotations
		got.Labels = seldonDeployment.Labels
		logs.Info("Updating SeldonDeployment")
		_, er = clientset.MachinelearningV1().SeldonDeployments(ns).Update(context.Background(),
			got, metav1.UpdateOptions{})
	} else {
		_, er = clientset.MachinelearningV1().SeldonDeployments(ns).Create(context.Background(),
			&seldonDeployment, metav1.CreateOptions{})
	}

before update, the old deployment works fine

  deploymentStatus:
    seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1:
      availableReplicas: 1
      replicas: 1
    seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-1-ali-1-cp-cp-2:
      availableReplicas: 1
      replicas: 1
    seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-2-ali-1-cp-cp-3:
      availableReplicas: 1
      replicas: 1
  description: 'Deployment.apps "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1"
    is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"seldon-app":"seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp",
    "seldon-app-svc":"seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-copy-9",
    "seldon-deployment-id":"seldon-test-ali-1-cp-cp"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
    field is immutable'
  replicas: 1
  state: Failed

I guess the strategy now is to update the old deployment directly.

old deployment (delete env and mounts)

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: '4'
  creationTimestamp: '2022-06-01T07:38:32Z'
  generation: 4
  labels:
    app: seldon-test-ali-1-cp-cp
    app.kubernetes.io/managed-by: seldon-core
  name: seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1
  namespace: seldon-test
  ownerReferences:
  - apiVersion: machinelearning.seldon.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SeldonDeployment
    name: seldon-test-ali-1-cp-cp
    uid: 5f6ed478-74bc-436a-a7ff-b0b0032f4736
  resourceVersion: '333984105'
  selfLink: "/apis/apps/v1/namespaces/seldon-test/deployments/seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1"
  uid: ff89c002-5d19-41f9-91a0-83a56ff2d7d1
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      seldon-app: seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp
      seldon-app-svc: seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-copy-7
      seldon-deployment-id: seldon-test-ali-1-cp-cp
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/path: "/prometheus"
        prometheus.io/scrape: 'true'
      creationTimestamp:
      labels:
        app: seldon-test-ali-1-cp-cp
        app.kubernetes.io/managed-by: seldon-core
        deploy-type: test
    spec:
      containers:
      - command:
        - bash
        - "/data/docker_script/entrypoint.sh"
        - 'false'
        - 'false'
        - cd bin/ && seldon-core-microservice MyModel --service-type MODEL
        image: xxxxxxx/default/seldon-test:t05051734.4af82392.r
        imagePullPolicy: IfNotPresent
        name: seldon-test-1
        ports:
        - containerPort: 6000
          name: metrics
          protocol: TCP
        - containerPort: 9000
          name: http
          protocol: TCP
        - containerPort: 9500
          name: grpc
          protocol: TCP
        securityContext:
          runAsUser: 0
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
      - command:
        - bash
        - "/data/docker_script/entrypoint.sh"
        - 'false'
        - 'false'
        - cd bin/ && seldon-core-microservice MyModel --service-type MODEL
        image: xxxxx.com/default/seldon-copy:t06011435.4af82392.r
        name: seldon-copy-7
        ports:
        - containerPort: 6001
          name: metrics
          protocol: TCP
        - containerPort: 9001
          name: http
          protocol: TCP
        - containerPort: 9501
          name: grpc
          protocol: TCP
        securityContext:
          runAsUser: 0
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
      - args:
        - "--sdep"
        - seldon-test-ali-1-cp-cp
        - "--namespace"
        - seldon-test
        - "--predictor"
        - seldon-test-ali-1-cp-cp
        - "--http_port"
        - '8000'
        - "--grpc_port"
        - '5001'
        - "--protocol"
        - seldon
        - "--prometheus_path"
        - "/prometheus"
        - "--server_type"
        - rpc
        - "--log_work_buffer_size"
        - '10000'
        - "--log_write_timeout_ms"
        - '2000'
        image: xxxxxx/seam/seldon-core-executor:1.13.1
        imagePullPolicy: IfNotPresent
        name: seldon-container-engine
        securityContext:
          runAsUser: 8888
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: "/etc/podinfo"
          name: seldon-podinfo
      dnsPolicy: ClusterFirst
      priorityClassName: highest-priority
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsUser: 8888
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: '2022-06-01T08:19:41Z'
    lastUpdateTime: '2022-06-01T08:21:23Z'
    message: ReplicaSet "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1-bf545968b"
      has successfully progressed.
    reason: NewReplicaSetAvailable
    status: 'True'
    type: Progressing
  - lastTransitionTime: '2022-06-01T08:30:56Z'
    lastUpdateTime: '2022-06-01T08:30:56Z'
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: 'False'
    type: Available
  observedGeneration: 4
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

kubectl delete deployment -n seldon-test seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1
deployment.apps "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1" deleted

after delete it ,operator rebuild it ,then new like below:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: '1'
  creationTimestamp: '2022-06-01T08:52:51Z'
  generation: 1
  labels:
    app: seldon-test-ali-1-cp-cp
    app.kubernetes.io/managed-by: seldon-core
  name: seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1
  namespace: seldon-test
  ownerReferences:
  - apiVersion: machinelearning.seldon.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SeldonDeployment
    name: seldon-test-ali-1-cp-cp
    uid: 5f6ed478-74bc-436a-a7ff-b0b0032f4736
  resourceVersion: '334019589'
  selfLink: "/apis/apps/v1/namespaces/seldon-test/deployments/seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1"
  uid: 88a64fa5-f702-4ace-b16f-f83c7fb08dba
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      seldon-app: seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp
      seldon-app-svc: seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-copy-9
      seldon-deployment-id: seldon-test-ali-1-cp-cp
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/path: "/prometheus"
        prometheus.io/scrape: 'true'
      creationTimestamp:
      labels:
        app: seldon-test-ali-1-cp-cp
        app.kubernetes.io/managed-by: seldon-core
        deploy-type: test
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - preference:
              matchExpressions:
              - key: MEM-CAP
                operator: In
                values:
                - 16GB
                - 32GB
                - 48GB
                - 64GB
                - 128GB
            weight: 100
          - preference:
              matchExpressions:
              - key: sub.biz.type
                operator: DoesNotExist
            weight: 100
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: project-name
                  operator: In
                  values:
                  - seldon-test
                - key: deploy-type
                  operator: In
                  values:
                  - test
              topologyKey: kubernetes.io/hostname
            weight: 10
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - seldon-test-ali-1-cp-cp
              topologyKey: kubernetes.io/hostname
            weight: 20
      containers:
      - command:
        - bash
        - "/data/docker_script/entrypoint.sh"
        - 'false'
        - 'false'
        - cd bin/ && seldon-core-microservice MyModel --service-type MODEL
        image: xxxxxxx/default/seldon-test:t05051734.4af82392.r
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - sleep 5
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 60
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 9000
          timeoutSeconds: 1
        name: seldon-test-1
        ports:
        - containerPort: 6000
          name: metrics
          protocol: TCP
        - containerPort: 9000
          name: http
          protocol: TCP
        - containerPort: 9500
          name: grpc
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 9000
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 20Gi
            memory: 107374182400m
          requests:
            cpu: 150m
            ephemeral-storage: 4Gi
            memory: 30Mi
        securityContext:
          runAsUser: 0
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
      - command:
        - bash
        - "/data/docker_script/entrypoint.sh"
        - 'false'
        - 'false'
        - cd bin/ && seldon-core-microservice MyModel --service-type MODEL
        image: xxxxx.com/default/seldon-copy:t06011435.4af82392.r
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - sleep 5
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 60
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 9001
          timeoutSeconds: 1
        name: seldon-copy-7
        ports:
        - containerPort: 6001
          name: metrics
          protocol: TCP
        - containerPort: 9001
          name: http
          protocol: TCP
        - containerPort: 9501
          name: grpc
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 9001
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 20Gi
            memory: 107374182400m
          requests:
            cpu: 150m
            ephemeral-storage: 4Gi
            memory: 30Mi
        securityContext:
          runAsUser: 0
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
      - command:
        - bash
        - "/data/docker_script/entrypoint.sh"
        - 'false'
        - 'false'
        - cd bin/ && seldon-core-microservice MyModel --service-type MODEL
        image: xxxx/default/seldon-copy:t06011435.4af82392.r
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - sleep 5
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 60
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 9002
          timeoutSeconds: 1
        name: seldon-copy-9
        ports:
        - containerPort: 6002
          name: metrics
          protocol: TCP
        - containerPort: 9002
          name: http
          protocol: TCP
        - containerPort: 9502
          name: grpc
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 9002
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 500m
            ephemeral-storage: 20Gi
            memory: 107374182400m
          requests:
            cpu: 150m
            ephemeral-storage: 4Gi
            memory: 30Mi
        securityContext:
          runAsUser: 0
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
      - args:
        - "--sdep"
        - seldon-test-ali-1-cp-cp
        - "--namespace"
        - seldon-test
        - "--predictor"
        - seldon-test-ali-1-cp-cp
        - "--http_port"
        - '8000'
        - "--grpc_port"
        - '5001'
        - "--protocol"
        - seldon
        - "--prometheus_path"
        - "/prometheus"
        - "--server_type"
        - rpc
        - "--log_work_buffer_size"
        - '10000'
        - "--log_write_timeout_ms"
        - '2000'
        image: xxxxx.com/seam/seldon-core-executor:1.13.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: "/live"
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 60
        name: seldon-container-engine
        ports:
        - containerPort: 8000
          name: http
          protocol: TCP
        - containerPort: 8000
          name: metrics
          protocol: TCP
        - containerPort: 5001
          name: grpc
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: "/ready"
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 60
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
        securityContext:
          runAsUser: 8888
        terminationMessagePath: "/dev/termination-log"
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: "/etc/podinfo"
          name: seldon-podinfo
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: repo-hz
      - name: ee-repo-hz
      nodeSelector:
        biz.type: common
      priorityClassName: highest-priority
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsUser: 8888
      terminationGracePeriodSeconds: 30
status:
  conditions:
  - lastTransitionTime: '2022-06-01T08:52:51Z'
    lastUpdateTime: '2022-06-01T08:52:51Z'
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: 'False'
    type: Available
  - lastTransitionTime: '2022-06-01T08:52:51Z'
    lastUpdateTime: '2022-06-01T08:52:51Z'
    message: ReplicaSet "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1-7d8945f447"
      is progressing.
    reason: ReplicaSetUpdated
    status: 'True'
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

new deployment have 4 contianer,but old only 3.

To reproduce

Expected behaviour

Environment

kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.10", GitCommit:"8152330a2b6ca3621196e62966ef761b8f5a61bb", GitTreeState:"clean", BuildDate:"2021-08-11T18:06:15Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.3", GitCommit:"01849e73f3c86211f05533c2e807736e776fcf29", GitTreeState:"clean", BuildDate:"2021-02-17T12:35:49Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

seldon-operator : 1.13.0

Model Details

  • Images of your model: [Output of: kubectl get seldondeployment -n <yourmodelnamespace> <seldondepname> -o yaml | grep image: where <yourmodelnamespace>]
  • Logs of your model: [You can get the logs of your model by running kubectl logs -n <yourmodelnamespace> <seldonpodname> <container>]
Events:
  Type     Reason            Age                   From                       Message
  ----     ------            ----                  ----                       -------
  Normal   UpdateService     51m (x5 over 100m)    seldon-controller-manager  Updated Service "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-copy-7"
  Normal   Updated           41m (x73 over 2d2h)   seldon-controller-manager  Updated SeldonDeployment "seldon-test-ali-1-cp-cp"
  Normal   UpdateDeployment  26m (x4 over 68m)     seldon-controller-manager  Updated Deployment "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-1-ali-1-cp-cp-2"
  Normal   UpdateDeployment  26m (x3 over 64m)     seldon-controller-manager  Updated Deployment "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1"
  Normal   UpdateDeployment  26m (x3 over 64m)     seldon-controller-manager  Updated Deployment "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-2-ali-1-cp-cp-3"
  Normal   UpdateService     15m (x9 over 2d1h)    seldon-controller-manager  Updated Service "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-test-2"
  Normal   CreateService     15m                   seldon-controller-manager  Created Service "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-copy-9"
  Warning  InternalError     5m23s (x18 over 15m)  seldon-controller-manager  Deployment.apps "seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-0-ali-1-cp-cp-1" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"seldon-app":"seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp", "seldon-app-svc":"seldon-test-ali-1-cp-cp-seldon-test-ali-1-cp-cp-seldon-copy-9", "seldon-deployment-id":"seldon-test-ali-1-cp-cp"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

guhan121 avatar Jun 01 '22 08:06 guhan121

https://github.com/SeldonIO/seldon-core/blob/db94a9954aea3aa8f0228a9e1831715579de26fb/operator/controllers/seldondeployment_controller.go#L751

func createContainerService call in

for k := 0; k < len(cSpec.Spec.Containers); k++ {
}

@ https://github.com/SeldonIO/seldon-core/blob/db94a9954aea3aa8f0228a9e1831715579de26fb/operator/controllers/seldondeployment_controller.go#L509

it means the final value is the last container svc name?

guhan121 avatar Jun 01 '22 09:06 guhan121

IS this related to the fixes in https://github.com/SeldonIO/seldon-core/pull/4043

ukclivecox avatar Jun 24 '22 06:06 ukclivecox

@cliveseldon i think #4043 not fix this.

guhan121 avatar Jul 21 '22 06:07 guhan121

@guhan121 just trying to validate this but I can't reproduce, can you provide a deployment yaml that works in a previous version of seldon core (and specify the exact version) which doesn't work in the latest version 1.14? I will then be able to validate on our side - ideally if you can use one of the example models (eg prepackaged iris model)

axsaucedo avatar Jul 22 '22 08:07 axsaucedo

@guhan121 closing this for now but if you can provide a manifest to replicate then we'll be able to test out and re-open (including the PR)

axsaucedo avatar Sep 05 '22 10:09 axsaucedo

error when replacing "/dev/shm/738476230": Deployment.apps "seldon-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"seldon", "app.kubernetes.io/instance":"seldon1", "app.kubernetes.io/name":"seldon", "app.kubernetes.io/version":"v0.5", "control-plane":"seldon-controller-manager"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

https://github.com/SeldonIO/seldon-core/blob/master/operator/config/manifests/bases/seldon-operator.clusterserviceversion.yaml#L608 I believe this is referring to this deployment

ZiaUrRehman-GBI avatar Dec 06 '22 14:12 ZiaUrRehman-GBI

@axsaucedo ^

ZiaUrRehman-GBI avatar Dec 06 '22 14:12 ZiaUrRehman-GBI