che The workspace status changed unexpectedly to "Stopped"

Summary

i'm receiving The workspace status changed unexpectedly to "Stopped" message, i have no idea what is the reason for this issue, please help me to debug this issue. it was working fine till yesterday

below is the error message shown in dashboard

below is the pod-workspace log

below is the devfile.yaml which i used to create the workspace

schemaVersion: 2.1.0
metadata:
  name: cbfsel-repo
projects:
  - name: cbfsel-project
    git:
      checkoutFrom:
        revision: master
      remotes:
        origin: https://gitlab.eng.vmware.com/dchelladurai/cbf-sel.git
components:
  - container:
      image: 'artfact-prd.vmware.com:5001/qedocker/eclipseche/customopenjdk8:v10'
      memoryLimit: 4G
      volumeMounts:
        - name: m2volume
          path: /home/user/.m2
    name: javacontainer
  - container:
      image: 'artfact-prd.vmware.com:5001/qedocker/eclipseche/selenium/standalone-chrome:4.3.0-20220706'
      memoryLimit: 4G
      endpoints:
        - exposure: public
          name: 4444-tcp
          protocol: tcp
          targetPort: 4444
        - exposure: public
          name: 5900-tcp
          protocol: tcp
          targetPort: 5900
        - exposure: public
          name: 7900-http
          protocol: http
          targetPort: 7900
          secure: true
    name: chromecontainer
  - name: m2volume
    volume:
      size: 4G
commands:
  - exec:
      commandLine: mvn clean package -DskipTests
      component: javacontainer
      group:
        isDefault: true
        kind: build
      label: 'build project using maven'
      workingDir: '${PROJECT_SOURCE}'
    id: mvnpackage

Relevant information

No response

Jul 19 '22 04:07 Divine1

Hi @Divine1 we need more logs to figure out what's going on.

Can you please look at following when the workspaces is starting:

the DW pod logs: (DW="cbfsel-repo"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10)
the namespace events: kubectl get events -n devine-chelladurai-che-pmjg9u
the devworkspace controller logs: kubectl logs -f deploy/devworkspace-controller-manager -n openshift-operators -c devworkspace-controller

And are you able to reproduce this problem systematically or it happens only from time to time?

And the version of Che you are using and how you are installing it are alos important informations.

Jul 19 '22 09:07 l0rd

@l0rd thank you for the response

chectl version

i installed eclipse-che using below command

chectl server:deploy --che-operator-cr-patch-yaml=/Users/divine/Documents/office_tasks/TAP-4540/terraformCheJune17/eclipseche_yaml/che-operator-cr-patch.yaml --platform=k8s --installer=operator --debug --k8spoderrorrechecktimeout=1500000 --domain=eclipseche-dchelladurai-chejune15.calatrava.vmware.com --k8spodreadytimeout=1500000

now i received a different error message. i have attached logs below

kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller

kubectl get events -n divine-chelladurai-che-pmjg9u -w

DW="cbfsel-repo"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

i'm looking forward for your suggestions

Jul 19 '22 15:07 Divine1

It looks like there is no PersistentVolume in your cluster that matches the PersistentVolumeClaim created by Che, hence the PVC remains unbound. You may want to decrease the m2volume size in your devfile (or remove the size at all) OR you can create a new PV matching the PVC size.

Jul 19 '22 17:07 l0rd

@l0rd

The PVC status shows it is Bounded to PV and Storageclass

does it have anything to do with below error in kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller ?

PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
    volumehealth.storage.kubernetes.io/health: accessible
    volumehealth.storage.kubernetes.io/health-timestamp: Tue Jul 19 12:27:08 UTC 2022
  creationTimestamp: "2022-07-19T12:23:39Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: claim-devworkspace
  namespace: divine-chelladurai-che-pmjg9u
  resourceVersion: "15076395"
  uid: e2c47772-dfd2-42b1-865f-2098582d795e
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: vmc-workload-storage-policy-cluster-1
  volumeMode: Filesystem
  volumeName: pvc-e2c47772-dfd2-42b1-865f-2098582d795e
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  phase: Bound

PersistentVolume

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
  creationTimestamp: "2022-07-19T12:23:41Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/csi-vsphere-vmware-com
  name: pvc-e2c47772-dfd2-42b1-865f-2098582d795e
  resourceVersion: "15077963"
  uid: a8b048e7-a823-4b7b-b666-ea4b500f2510
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: claim-devworkspace
    namespace: divine-chelladurai-che-pmjg9u
    resourceVersion: "15075219"
    uid: e2c47772-dfd2-42b1-865f-2098582d795e
  csi:
    driver: csi.vsphere.vmware.com
    fsType: ext4
    volumeAttributes:
      storage.kubernetes.io/csiProvisionerIdentity: 1655847079993-8081-csi.vsphere.vmware.com
      type: vSphere CNS Block Volume
    volumeHandle: df63f994-eaeb-47a8-ac23-0e222cac3b10-e2c47772-dfd2-42b1-865f-2098582d795e
  persistentVolumeReclaimPolicy: Delete
  storageClassName: vmc-workload-storage-policy-cluster-1
  volumeMode: Filesystem
status:
  phase: Bound

StorageClass

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2022-06-15T05:57:58Z"
  labels:
    isSyncedFromSupervisor: "yes"
  name: vmc-workload-storage-policy-cluster-1
  resourceVersion: "161"
  uid: 3e85fd47-ad27-4c61-bca5-44ef2b8410c7
parameters:
  svStorageClass: vmc-workload-storage-policy-cluster-1
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Jul 19 '22 17:07 Divine1

@l0rd

i ran this command export DW="cbfsel-repo1"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

i'm able to see couple of errors here, does it have anything to do with my issue?

Jul 19 '22 17:07 Divine1

@AObuchow do you have any clue about the PVC problem above?

Jul 19 '22 17:07 l0rd

couple of errors are present while viewing logs using below command export DW="cbfsel-repo1"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

Jul 19 '22 18:07 Divine1

@l0rd

i used this devfile.yaml (https://github.com/Divine1/demonodejs.git) to create workspace. . The controller.devfile.io/devworkspace_name still throws an error. please let me know what might be the issue

export DW="nodejs-web-app-github"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller

Jul 20 '22 02:07 Divine1

@l0rd

i did chectl server:delete and chectl server:deploy with my updated checluster policy to reduce the pvc claim to 5Gi from 10Gi doc link

but the pvc claim is still 10gi when i start a new workspace. what am i doing wrong?

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  name: eclipse-che
spec:
  components:
    cheServer:
      extraProperties:
        CHE_INFRA_KUBERNETES_WORKSPACE__START__TIMEOUT__MIN: "15"
        CHE_INFRA_KUBERNETES_PVC_QUANTITY: "5Gi"
    database:
      externalDb: true
      postgresHostName: sc2-10-186-67-195.eng.vmware.com
      postgresPort: "5432"
  networking:
    auth:
      identityProviderURL: https://dex-dchelladurai-chejune15.calatrava.vmware.com
      oAuthClientName: eclipse-che
      oAuthSecret: ZXhhbXBsZS1hcHAtc2VjcmV0

i updated my chectl version

i deployed eclipse-che using below command

chectl server:deploy --che-operator-cr-patch-yaml=/Users/divine/Documents/office_tasks/TAP-4540/terraformCheJune17/eclipseche_yaml/che-operator-cr-patch.yaml --platform=k8s --installer=operator --debug --k8spoderrorrechecktimeout=1500000 --domain=eclipseche-dchelladurai-chejune15.calatrava.vmware.com --k8spodreadytimeout=1500000

Jul 20 '22 04:07 Divine1

I faced the same problem. It seems workspace immediately stopped after booting up. DevWorkspace controller log shows:

{"level":"info","ts":1658319000.073287,"logger":"controllers.DevWorkspace","msg":"Reconciling Workspace","Request.Namespace":"che-kube-admin-che-x4dl9t","Request.Name":"bash","devworkspace_id":"workspace0256ca635bb144a8"}
{"level":"info","ts":1658319000.0918732,"logger":"controllers.DevWorkspace","msg":"Workspace stopped with reason","Request.Namespace":"che-kube-admin-che-x4dl9t","Request.Name":"bash","devworkspace_id":"workspace0256ca635bb144a8","stopped-by":"inactivity"}

/cc @dkwon17 Could you have a look pls.

Jul 20 '22 12:07 tolusha

After updating operator image, I can't reproduce The workspace status changed unexpectedly to "Stopped" error anymore

Jul 20 '22 14:07 tolusha

Hi @Divine1 could you please run and paste the output for:

kubectl get configmap che-idle-settings -n divine-chelladurai-che-pmjg9u -o yaml

Jul 20 '22 14:07 dkwon17

@dkwon17 thank you for looking into this issue

below is the output of command kubectl get configmap che-idle-settings -n divine-chelladurai-che-pmjg9u -o yaml

Please let me know if any details are needed

Jul 20 '22 17:07 Divine1

After updating operator image, I can't reproduce The workspace status changed unexpectedly to "Stopped" error anymore

@tolusha how can i update my operator image

currently i use chectl/0.0.20220718-next.aa6153f darwin-x64 node-v16.13.2

Jul 20 '22 17:07 Divine1

@Divine1 You can simply delete operator pod. A new one will come up in a few seconds and a new operator image (quay.io/eclipse/che-operator:next) will be pulled up.

Jul 20 '22 17:07 tolusha

@tolusha i updated the che operator image as you said i used this repo for testing below scenario https://github.com/Divine1/demochenodejs.git

my chectl version chectl/0.0.20220718-next.aa6153f darwin-x64 node-v16.13.2

now i receive below error, but my pvc,pv,storageclassess are available as shown in below screenshot.

kubectl get events -n divine-chelladurai-che-pmjg9u -w

export DW="nodejs-web-app-githubche"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller

i dont know how to debug and fix this issue. based on the logs, the PVC,PV doesnot show any issue here

please help me on this

Jul 20 '22 18:07 Divine1

Thank you @Divine1 ,

Could you check what happens if you delete the labels and annotations from the che-idle-settings configmap?

ie, run

 kubectl patch configmap che-idle-settings -n divine-chelladurai-che-pmjg9u --type=json -p='[{"op": "remove", "path": "/metadata/annotations"}]'

and

  kubectl patch configmap che-idle-settings -n divine-chelladurai-che-pmjg9u --type=json -p='[{"op": "remove", "path": "/metadata/labels"}]'

Then, when you start a workspace do you get the The workspace status changed unexpectedly to "Stopped" error again?

Jul 20 '22 23:07 dkwon17

@Divine1 Try to inspect logs from a storage provisioner container. They might contain some clues.

Jul 21 '22 14:07 tolusha

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

Jan 17 '23 00:01 che-bot

che che copied to clipboard

The workspace status changed unexpectedly to "Stopped"

Summary

Relevant information

che
che copied to clipboard