che icon indicating copy to clipboard operation
che copied to clipboard

The workspace status changed unexpectedly to "Stopped"

Open Divine1 opened this issue 2 years ago • 18 comments

Summary

i'm receiving The workspace status changed unexpectedly to "Stopped" message, i have no idea what is the reason for this issue, please help me to debug this issue. it was working fine till yesterday

below is the error message shown in dashboard image

below is the pod-workspace log image

below is the devfile.yaml which i used to create the workspace

schemaVersion: 2.1.0
metadata:
  name: cbfsel-repo
projects:
  - name: cbfsel-project
    git:
      checkoutFrom:
        revision: master
      remotes:
        origin: https://gitlab.eng.vmware.com/dchelladurai/cbf-sel.git
components:
  - container:
      image: 'artfact-prd.vmware.com:5001/qedocker/eclipseche/customopenjdk8:v10'
      memoryLimit: 4G
      volumeMounts:
        - name: m2volume
          path: /home/user/.m2
    name: javacontainer
  - container:
      image: 'artfact-prd.vmware.com:5001/qedocker/eclipseche/selenium/standalone-chrome:4.3.0-20220706'
      memoryLimit: 4G
      endpoints:
        - exposure: public
          name: 4444-tcp
          protocol: tcp
          targetPort: 4444
        - exposure: public
          name: 5900-tcp
          protocol: tcp
          targetPort: 5900
        - exposure: public
          name: 7900-http
          protocol: http
          targetPort: 7900
          secure: true
    name: chromecontainer
  - name: m2volume
    volume:
      size: 4G
commands:
  - exec:
      commandLine: mvn clean package -DskipTests
      component: javacontainer
      group:
        isDefault: true
        kind: build
      label: 'build project using maven'
      workingDir: '${PROJECT_SOURCE}'
    id: mvnpackage

Relevant information

No response

Divine1 avatar Jul 19 '22 04:07 Divine1

Hi @Divine1 we need more logs to figure out what's going on.

Can you please look at following when the workspaces is starting:

  • the DW pod logs: (DW="cbfsel-repo"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10)
  • the namespace events: kubectl get events -n devine-chelladurai-che-pmjg9u
  • the devworkspace controller logs: kubectl logs -f deploy/devworkspace-controller-manager -n openshift-operators -c devworkspace-controller

And are you able to reproduce this problem systematically or it happens only from time to time?

And the version of Che you are using and how you are installing it are alos important informations.

l0rd avatar Jul 19 '22 09:07 l0rd

@l0rd thank you for the response

chectl version image

i installed eclipse-che using below command

chectl server:deploy --che-operator-cr-patch-yaml=/Users/divine/Documents/office_tasks/TAP-4540/terraformCheJune17/eclipseche_yaml/che-operator-cr-patch.yaml --platform=k8s --installer=operator --debug --k8spoderrorrechecktimeout=1500000 --domain=eclipseche-dchelladurai-chejune15.calatrava.vmware.com --k8spodreadytimeout=1500000

now i received a different error message. i have attached logs below image

kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller image image

kubectl get events -n divine-chelladurai-che-pmjg9u -w image

DW="cbfsel-repo"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u image

i'm looking forward for your suggestions

Divine1 avatar Jul 19 '22 15:07 Divine1

It looks like there is no PersistentVolume in your cluster that matches the PersistentVolumeClaim created by Che, hence the PVC remains unbound. You may want to decrease the m2volume size in your devfile (or remove the size at all) OR you can create a new PV matching the PVC size.

l0rd avatar Jul 19 '22 17:07 l0rd

@l0rd

The PVC status shows it is Bounded to PV and Storageclass

does it have anything to do with below error in kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller ?

image

PersistentVolumeClaim image

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
    volumehealth.storage.kubernetes.io/health: accessible
    volumehealth.storage.kubernetes.io/health-timestamp: Tue Jul 19 12:27:08 UTC 2022
  creationTimestamp: "2022-07-19T12:23:39Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: claim-devworkspace
  namespace: divine-chelladurai-che-pmjg9u
  resourceVersion: "15076395"
  uid: e2c47772-dfd2-42b1-865f-2098582d795e
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: vmc-workload-storage-policy-cluster-1
  volumeMode: Filesystem
  volumeName: pvc-e2c47772-dfd2-42b1-865f-2098582d795e
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  phase: Bound

PersistentVolume image

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
  creationTimestamp: "2022-07-19T12:23:41Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/csi-vsphere-vmware-com
  name: pvc-e2c47772-dfd2-42b1-865f-2098582d795e
  resourceVersion: "15077963"
  uid: a8b048e7-a823-4b7b-b666-ea4b500f2510
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: claim-devworkspace
    namespace: divine-chelladurai-che-pmjg9u
    resourceVersion: "15075219"
    uid: e2c47772-dfd2-42b1-865f-2098582d795e
  csi:
    driver: csi.vsphere.vmware.com
    fsType: ext4
    volumeAttributes:
      storage.kubernetes.io/csiProvisionerIdentity: 1655847079993-8081-csi.vsphere.vmware.com
      type: vSphere CNS Block Volume
    volumeHandle: df63f994-eaeb-47a8-ac23-0e222cac3b10-e2c47772-dfd2-42b1-865f-2098582d795e
  persistentVolumeReclaimPolicy: Delete
  storageClassName: vmc-workload-storage-policy-cluster-1
  volumeMode: Filesystem
status:
  phase: Bound

StorageClass image

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2022-06-15T05:57:58Z"
  labels:
    isSyncedFromSupervisor: "yes"
  name: vmc-workload-storage-policy-cluster-1
  resourceVersion: "161"
  uid: 3e85fd47-ad27-4c61-bca5-44ef2b8410c7
parameters:
  svStorageClass: vmc-workload-storage-policy-cluster-1
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Divine1 avatar Jul 19 '22 17:07 Divine1

@l0rd

i ran this command export DW="cbfsel-repo1"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

i'm able to see couple of errors here, does it have anything to do with my issue? image

image

Divine1 avatar Jul 19 '22 17:07 Divine1

@AObuchow do you have any clue about the PVC problem above?

l0rd avatar Jul 19 '22 17:07 l0rd

couple of errors are present while viewing logs using below command export DW="cbfsel-repo1"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

image

Divine1 avatar Jul 19 '22 18:07 Divine1

@l0rd

i used this devfile.yaml (https://github.com/Divine1/demonodejs.git) to create workspace. . The controller.devfile.io/devworkspace_name still throws an error. please let me know what might be the issue

export DW="nodejs-web-app-github"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u

image

kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller image

Divine1 avatar Jul 20 '22 02:07 Divine1

@l0rd

i did chectl server:delete and chectl server:deploy with my updated checluster policy to reduce the pvc claim to 5Gi from 10Gi doc link

but the pvc claim is still 10gi when i start a new workspace. what am i doing wrong?

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  name: eclipse-che
spec:
  components:
    cheServer:
      extraProperties:
        CHE_INFRA_KUBERNETES_WORKSPACE__START__TIMEOUT__MIN: "15"
        CHE_INFRA_KUBERNETES_PVC_QUANTITY: "5Gi"
    database:
      externalDb: true
      postgresHostName: sc2-10-186-67-195.eng.vmware.com
      postgresPort: "5432"
  networking:
    auth:
      identityProviderURL: https://dex-dchelladurai-chejune15.calatrava.vmware.com
      oAuthClientName: eclipse-che
      oAuthSecret: ZXhhbXBsZS1hcHAtc2VjcmV0

i updated my chectl version image

i deployed eclipse-che using below command

chectl server:deploy --che-operator-cr-patch-yaml=/Users/divine/Documents/office_tasks/TAP-4540/terraformCheJune17/eclipseche_yaml/che-operator-cr-patch.yaml --platform=k8s --installer=operator --debug --k8spoderrorrechecktimeout=1500000 --domain=eclipseche-dchelladurai-chejune15.calatrava.vmware.com --k8spodreadytimeout=1500000

Divine1 avatar Jul 20 '22 04:07 Divine1

I faced the same problem. It seems workspace immediately stopped after booting up. DevWorkspace controller log shows:

{"level":"info","ts":1658319000.073287,"logger":"controllers.DevWorkspace","msg":"Reconciling Workspace","Request.Namespace":"che-kube-admin-che-x4dl9t","Request.Name":"bash","devworkspace_id":"workspace0256ca635bb144a8"}
{"level":"info","ts":1658319000.0918732,"logger":"controllers.DevWorkspace","msg":"Workspace stopped with reason","Request.Namespace":"che-kube-admin-che-x4dl9t","Request.Name":"bash","devworkspace_id":"workspace0256ca635bb144a8","stopped-by":"inactivity"}

/cc @dkwon17 Could you have a look pls.

tolusha avatar Jul 20 '22 12:07 tolusha

After updating operator image, I can't reproduce The workspace status changed unexpectedly to "Stopped" error anymore

tolusha avatar Jul 20 '22 14:07 tolusha

Hi @Divine1 could you please run and paste the output for:

kubectl get configmap che-idle-settings -n divine-chelladurai-che-pmjg9u -o yaml

dkwon17 avatar Jul 20 '22 14:07 dkwon17

@dkwon17 thank you for looking into this issue

below is the output of command kubectl get configmap che-idle-settings -n divine-chelladurai-che-pmjg9u -o yaml

Please let me know if any details are needed image

Divine1 avatar Jul 20 '22 17:07 Divine1

After updating operator image, I can't reproduce The workspace status changed unexpectedly to "Stopped" error anymore

@tolusha how can i update my operator image

currently i use chectl/0.0.20220718-next.aa6153f darwin-x64 node-v16.13.2 image

Divine1 avatar Jul 20 '22 17:07 Divine1

@Divine1 You can simply delete operator pod. A new one will come up in a few seconds and a new operator image (quay.io/eclipse/che-operator:next) will be pulled up.

tolusha avatar Jul 20 '22 17:07 tolusha

@tolusha i updated the che operator image as you said i used this repo for testing below scenario https://github.com/Divine1/demochenodejs.git

my chectl version chectl/0.0.20220718-next.aa6153f darwin-x64 node-v16.13.2

now i receive below error, but my pvc,pv,storageclassess are available as shown in below screenshot.

image image

kubectl get events -n divine-chelladurai-che-pmjg9u -w image

export DW="nodejs-web-app-githubche"; kubectl logs -f -l controller.devfile.io/devworkspace_name="${DW}" --all-containers --max-log-requests 10 -n divine-chelladurai-che-pmjg9u image

kubectl logs -f deploy/devworkspace-controller-manager -n devworkspace-controller -c devworkspace-controller image

i dont know how to debug and fix this issue. based on the logs, the PVC,PV doesnot show any issue here

please help me on this

Divine1 avatar Jul 20 '22 18:07 Divine1

Thank you @Divine1 ,

Could you check what happens if you delete the labels and annotations from the che-idle-settings configmap?

ie, run

 kubectl patch configmap che-idle-settings -n divine-chelladurai-che-pmjg9u --type=json -p='[{"op": "remove", "path": "/metadata/annotations"}]'

and

  kubectl patch configmap che-idle-settings -n divine-chelladurai-che-pmjg9u --type=json -p='[{"op": "remove", "path": "/metadata/labels"}]'

Then, when you start a workspace do you get the The workspace status changed unexpectedly to "Stopped" error again?

dkwon17 avatar Jul 20 '22 23:07 dkwon17

@Divine1 Try to inspect logs from a storage provisioner container. They might contain some clues.

tolusha avatar Jul 21 '22 14:07 tolusha

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

che-bot avatar Jan 17 '23 00:01 che-bot