che icon indicating copy to clipboard operation
che copied to clipboard

Devworkspace storage-type attribute temporarily removed when restarting from local devfile

Open cgruver opened this issue 1 year ago • 14 comments

Describe the bug

Eclipse Che (Dev Spaces) appears to be creating a user scoped PVC when none has been requested.

Che version

7.88

Steps to reproduce

Create a CheCluster:

kind: CheCluster   
metadata:              
  name: devspaces  
  namespace: devspaces
spec:                         
  components:                  
    cheServer:      
      debug: false
      logLevel: INFO
    metrics:                
      enable: true
    pluginRegistry:
      openVSXURL: https://open-vsx.org
  containerRegistry: {}      
  devEnvironments:       
    startTimeoutSeconds: 300
    secondsOfRunBeforeIdling: -1
    maxNumberOfWorkspacesPerUser: -1
    maxNumberOfRunningWorkspacesPerUser: 5
    containerBuildConfiguration:
      openShiftSecurityContextConstraint: container-build
    disableContainerBuildCapabilities: false
    defaultComponents:
    - name: dev-tools
      container:
        image: quay.io/cgruver0/che/dev-tools:latest
        memoryLimit: 6Gi
        mountSources: true
    defaultEditor: che-incubator/che-code/latest
    defaultNamespace:
      autoProvision: true
      template: <username>-devspaces
    secondsOfInactivityBeforeIdling: 1800
    storage:
      pvcStrategy: per-workspace
      perUserStrategyPvcConfig:
        storageClass: ${STORAGE_CLASS}
      perWorkspaceStrategyPvcConfig:
        storageClass: ${STORAGE_CLASS}
  gitServices: {}
  networking: {}

Login as a user and create workspaces.

After an unknown amount of time, you will observe an unbound PVC for your user.

Screenshot 2024-08-22 at 8 22 32 AM

Expected behavior

The operator will not create a per-user PVC

Runtime

OpenShift

Screenshots

No response

Installation method

OperatorHub

Environment

macOS

Eclipse Che Logs

No response

Additional context

Note: The creation of the rogue PVC does not appear to be associated with any workspace lifecycle events that I can find.

If I delete the rogue PVC, it will eventually get recreated.

cgruver avatar Aug 22 '24 12:08 cgruver

@AObuchow @dkwon17 could you please take a look and please change the severity if P2 is not enough

svor avatar Aug 22 '24 14:08 svor

@cgruver The rogue PVC is the claim-devworkspace PVC correct? It's very strange that this is getting created, especially given the fact you're using the per-workspace PVC strategy and not the per-user PVC strategy.

I wonder if this has to do with the fact you are specifying a storageClass for the perUserStrategyPvcConfig (though, that would definetly be a bug).

AObuchow avatar Aug 22 '24 17:08 AObuchow

@cgruver The rogue PVC is the claim-devworkspace PVC correct? It's very strange that this is getting created, especially given the fact you're using the per-workspace PVC strategy and not the per-user PVC strategy.

Correct.

I wonder if this has to do with the fact you are specifying a storageClass for the perUserStrategyPvcConfig (though, that would definetly be a bug).

I tried removing the perUserStrategyPvcConfig entry. It still creates that rogue PVC.

cgruver avatar Aug 22 '24 17:08 cgruver

@cgruver Do you mind sharing the DevWorkspace Operator logs if possible? They should be logging if the claim-devworkspace PVC is being created. I'm currently trying to reproduce this issue after having setup the hostpath-csi storage class.

Edit: I wasn't able to reproduce the issue before my cluster expired. However, I wonder if this bug is specific to using the hostpath-csi storage class?

AObuchow avatar Aug 22 '24 18:08 AObuchow

@AObuchow I've also seen this behavior with Rook/Ceph aka OpenShift Data Foundations.

I'll grab the logs for you.

Here's another interesting data point. If I do not specify perUserStrategyPvcConfig but let it just use the default StorageClass, I do not see this behavior.

The only reason that I stumbled onto this issue is because I was running a cluster with multiple storage classes... I first noticed it when running a cluster with both Rook/Ceph and a Qnap CSI driver.

In that first case, I was using the QNAP provisioner for Dev Spaces and Rook/Ceph as the default for everything else.

The rogue PVC was provisioned with the default storage class, aka Rook/Ceph. Where all of the workspace PVC used the specified QNAP CSI storage class.

cgruver avatar Aug 23 '24 19:08 cgruver

@AObuchow Logs from devworkspace-controller-manager

claim-devworkspace is the rogue PVC that gets created.

I haven't yet found a trigger event, but there are some interesting things to point out:

  1. The workspaces that it seems to be created for do not have the attribute: controller.devfile.io/storage-type: per-workspace set in their devfile. This does not happen for workspaces with that attribute, as expected.
  2. It only seems to happen if I have a setting for perUserStrategyPvcConfig.storageClass. If I let it use the default storageClass, this does not occur.

Current CheCluster:

apiVersion: org.eclipse.che/v2
kind: CheCluster
metadata:
  name: devspaces
  namespace: devspaces
spec:
  components:
    cheServer:
      debug: false
      logLevel: INFO
    dashboard:
      logLevel: ERROR
    devWorkspace: {}
    devfileRegistry: {}
    imagePuller:
      enable: false
      spec: {}
    metrics:
      enable: true
    pluginRegistry:
      openVSXURL: 'https://open-vsx.org'
  containerRegistry: {}
  devEnvironments:
    startTimeoutSeconds: 300
    security: {}
    secondsOfRunBeforeIdling: -1
    maxNumberOfWorkspacesPerUser: -1
    containerBuildConfiguration:
      openShiftSecurityContextConstraint: container-build
    disableContainerBuildCapabilities: false
    defaultEditor: che-incubator/che-code/latest
    maxNumberOfRunningWorkspacesPerUser: 5
    defaultComponents:
      - container:
          image: 'quay.io/cgruver0/che/dev-tools:latest'
          memoryLimit: 6Gi
          mountSources: true
          sourceMapping: /projects
        name: dev-tools
    defaultNamespace:
      autoProvision: true
      template: <username>-devspaces
    secondsOfInactivityBeforeIdling: 1800
    storage:
      perUserStrategyPvcConfig:
        storageClass: hostpath-csi
      pvcStrategy: per-workspace
  gitServices: {}
  networking:
    auth:
      gateway:
        configLabels:
          app: che
          component: che-gateway-config

devworkspace-controller-manager-cfd46bdfd-v29w9-devworkspace-controller.log

cgruver avatar Aug 26 '24 18:08 cgruver

Relevant Devfiles:

https://github.com/dora-metrics/dev-workspace/blob/main/devfile.yaml https://github.com/cgruver/devspaces-backstage-plugin/blob/main/devfile.yaml

cgruver avatar Aug 26 '24 18:08 cgruver

@cgruver Thank you so much for the investigation and helpful information.. this is indeed a strange edge case bug.

The workspaces that it seems to be created for do not have the attribute: controller.devfile.io/storage-type: per-workspace set in their devfile. This does not happen for workspaces with that attribute, as expected.

Do the devworkspaces (as opposed to the devfiles) have the controller.devfile.io/storage-type attribute set? or is it missing?

AObuchow avatar Aug 28 '24 19:08 AObuchow

@AObuchow here's the DevWorkspace for the devs-aces-backstage-plugin workspace:

apiVersion: workspace.devfile.io/v1alpha2
kind: DevWorkspace
metadata:
  annotations:
    che.eclipse.org/che-editor: che-incubator/che-code/latest
    che.eclipse.org/devfile: |
      schemaVersion: 2.2.0
      metadata:
        name: devspaces-backstage-plugin
        namespace: cgruver-devspaces
      projects:
        - name: devspaces-backstage-plugin
          git:
            checkoutFrom:
              remote: origin
              revision: main
            remotes:
              origin: https://github.com/cgruver/devspaces-backstage-plugin.git
      components:
        - name: dev-tools
          container:
            image: quay.io/cgruver0/che/node20-dev-tools:latest
            mountSources: true
            memoryRequest: 500Mi
            memoryLimit: 6G
            cpuRequest: 100m
            cpuLimit: 2000m
            env:
              - name: VSCODE_DEFAULT_WORKSPACE
                value: /projects/devspaces-backstage-plugin/workspace.code-workspace
              - name: HOME
                value: /projects/home
            endpoints:
              - name: frontend
                protocol: https
                targetPort: 3000
                exposure: public
              - name: backend
                protocol: https
                targetPort: 7007
                exposure: public
        - volume:
            size: 20Gi
          name: projects
        - name: oc-cli
          container:
            args:
              - '-c'
              - >-
                mkdir -p /projects/bin && cp /usr/bin/oc /projects/bin/oc && cp
                /usr/bin/kubectl /projects/bin/kubectl
            command:
              - /bin/bash
            image: image-registry.openshift-image-registry.svc:5000/openshift/cli:latest
            sourceMapping: /projects
            mountSources: true
            memoryLimit: 256M
        - name: prep-workspace
          container:
            image: quay.io/cgruver0/che/node20-dev-tools:latest
            mountSources: true
            sourceMapping: /projects
            memoryRequest: 128Mi
            memoryLimit: 256Mi
            cpuRequest: 10m
            cpuLimit: 200m
            env:
              - name: HOME
                value: /projects/home
            args:
              - '-c'
              - if [[ -f ${HOME}/.kube/config ]]; then rm ${HOME}/.kube/config; fi
            command:
              - /bin/bash
      commands:
        - apply:
            component: oc-cli
            label: Copy OpenShift CLI
          id: cp-oc-cli
        - apply:
            component: prep-workspace
            label: Prestart Workspace Prep
          id: prep-workspace
      events:
        preStart:
          - cp-oc-cli
          - prep-workspace
      attributes:
        dw.metadata.annotations:
          che.eclipse.org/devfile-source: |
            scm:
              repo: https://github.com/cgruver/devspaces-backstage-plugin.git
              fileName: devfile.yaml
            factory:
              params: url=https://github.com/cgruver/devspaces-backstage-plugin.git
    che.eclipse.org/last-updated-timestamp: '2024-08-29T12:02:52.250Z'
    controller.devfile.io/started-at: '1724932977840'
  resourceVersion: '3035461'
  name: devspaces-backstage-plugin
  uid: 0a5aa228-b310-463c-895d-f43a449d4a3b
  creationTimestamp: '2024-08-22T12:31:02Z'
  generation: 19
  managedFields:
    - apiVersion: workspace.devfile.io/v1alpha2
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:che.eclipse.org/che-editor': {}
            'f:che.eclipse.org/devfile': {}
            'f:che.eclipse.org/last-updated-timestamp': {}
        'f:spec':
          .: {}
          'f:contributions': {}
          'f:routingClass': {}
          'f:started': {}
          'f:template':
            .: {}
            'f:attributes':
              .: {}
              'f:controller.devfile.io/devworkspace-config':
                .: {}
                'f:name': {}
                'f:namespace': {}
              'f:controller.devfile.io/scc': {}
              'f:controller.devfile.io/storage-type': {}
            'f:commands': {}
            'f:components': {}
            'f:events':
              .: {}
              'f:preStart': {}
            'f:projects': {}
      manager: unknown
      operation: Update
      time: '2024-08-29T12:02:52Z'
    - apiVersion: workspace.devfile.io/v1alpha2
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            'f:controller.devfile.io/started-at': {}
          'f:finalizers':
            .: {}
            'v:"rbac.controller.devfile.io"': {}
            'v:"storage.controller.devfile.io"': {}
      manager: devworkspace-controller
      operation: Update
      time: '2024-08-29T12:02:57Z'
    - apiVersion: workspace.devfile.io/v1alpha2
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          .: {}
          'f:conditions': {}
          'f:devworkspaceId': {}
          'f:mainUrl': {}
          'f:message': {}
          'f:phase': {}
      manager: devworkspace-controller
      operation: Update
      subresource: status
      time: '2024-08-29T12:02:57Z'
  namespace: cgruver-devspaces
  finalizers:
    - rbac.controller.devfile.io
    - storage.controller.devfile.io
  labels:
    controller.devfile.io/creator: 08772211-1f25-49a7-9f77-1f9a3f75c2a1
spec:
  contributions:
    - kubernetes:
        name: che-code-devspaces-backstage-plugin
      name: editor
  routingClass: che
  started: true
  template:
    attributes:
      controller.devfile.io/devworkspace-config:
        name: devworkspace-config
        namespace: devspaces
      controller.devfile.io/scc: container-build
      controller.devfile.io/storage-type: per-workspace
    commands:
      - apply:
          component: prep-workspace
          label: Prestart Workspace Prep
        id: prep-workspace
    components:
      - container:
          cpuRequest: 100m
          env:
            - name: VSCODE_DEFAULT_WORKSPACE
              value: /projects/devspaces-backstage-plugin/workspace.code-workspace
            - name: HOME
              value: /projects/home
          memoryRequest: 500Mi
          sourceMapping: /projects
          cpuLimit: 2000m
          memoryLimit: 6G
          image: 'quay.io/cgruver0/che/node20-dev-tools:latest'
          endpoints:
            - exposure: public
              name: frontend
              protocol: https
              targetPort: 3000
            - exposure: public
              name: backend
              protocol: https
              targetPort: 7007
          mountSources: true
        name: dev-tools
      - name: projects
        volume:
          size: 20Gi
      - container:
          cpuRequest: 10m
          command:
            - /bin/bash
          env:
            - name: HOME
              value: /projects/home
          memoryRequest: 128Mi
          sourceMapping: /projects
          cpuLimit: 200m
          memoryLimit: 256Mi
          image: 'image-registry.openshift-image-registry.svc:5000/openshift/cli:latest'
          args:
            - '-c'
            - 'mkdir -p /projects/bin && cp /usr/bin/oc /projects/bin/oc && cp /usr/bin/kubectl /projects/bin/kubectl && if [[ -f ${HOME}/.kube/config ]]; then rm ${HOME}/.kube/config; fi'
          mountSources: true
        name: prep-workspace
    events:
      preStart:
        - prep-workspace
    projects:
      - git:
          checkoutFrom:
            remote: origin
            revision: main
          remotes:
            origin: 'https://github.com/cgruver/devspaces-backstage-plugin.git'
        name: devspaces-backstage-plugin
status:
  conditions:
    - lastTransitionTime: '2024-08-29T12:02:52Z'
      message: DevWorkspace is starting
      status: 'True'
      type: Started
    - lastTransitionTime: '2024-08-29T12:02:52Z'
      message: Resolved plugins and parents from DevWorkspace
      status: 'True'
      type: DevWorkspaceResolved
    - lastTransitionTime: '2024-08-29T12:02:52Z'
      message: Storage ready
      status: 'True'
      type: StorageReady
    - lastTransitionTime: '2024-08-29T12:02:52Z'
      message: Networking ready
      status: 'True'
      type: RoutingReady
    - lastTransitionTime: '2024-08-29T12:02:52Z'
      message: DevWorkspace serviceaccount ready
      status: 'True'
      type: ServiceAccountReady
    - lastTransitionTime: '2024-08-29T12:02:52Z'
      message: DevWorkspace secrets ready
      status: 'True'
      type: PullSecretsReady
    - lastTransitionTime: '2024-08-29T12:02:57Z'
      message: DevWorkspace deployment ready
      status: 'True'
      type: DeploymentReady
    - lastTransitionTime: '2024-08-29T12:02:57Z'
      status: 'True'
      type: Ready
  devworkspaceId: workspace0a5aa228b310463c
  mainUrl: 'https://devspaces.apps.control-plane.clg.lab/cgruver/devspaces-backstage-plugin/3100/'
  message: 'https://devspaces.apps.control-plane.clg.lab/cgruver/devspaces-backstage-plugin/3100/'
  phase: Running

cgruver avatar Aug 29 '24 12:08 cgruver

It does have that attribute set.

cgruver avatar Aug 29 '24 12:08 cgruver

@cgruver Sorry for the delayed reply. I'm still puzzled at this bug & have been preoccupied trying to resolve other tasks :l The fact that I haven't been able to manually reproduce it myself yet also complicates things a bit.

AObuchow avatar Sep 03 '24 21:09 AObuchow

@AObuchow I think I caught it in the act.

It appears to happen when I make a change to the devfile and restart from local devfile.

Above is a fresh log file.

cgruver avatar Sep 26 '24 12:09 cgruver

@cgruver Thank you so much!! I was finally able to reproduce this myself.

To reproduce:

  1. Ensure you do not have a common PVC created in your Che user namespace (i.e. if you have a claim-devworkspace PVC, delete it)
  2. Create a workspace from a devfile (e.g. this one) and ensure the per-workspace storage strategy is being used. On the Che dogfooding cluster, per-workspace is the default storage strategy.
  3. Restart your workspace from the local devfile: View -> Command Pallet -> Restart workspace from local devfile
  4. A claim-devworkspace PVC will be created in your Che user namespace even though you're using per-workspace storage

In my testing, when restarting the workspace from the local devfile, I believe the devworkspace's spec.template.attributes are temporarily removed and then added back. During the period that these attributes are removed, the controller.devfile.io/storage-type attribute is removed, which causes the default common/per-user storage strategy to be temporarily used, causing the claim-devworkspace PVC to be provisioned.

To observe this, run the following in your terminal, and then restart your workspace from the local devfile:

while true; do
  kubectl get dw devspaces-backstage-plugin -o jsonpath='{.spec.template.attributes}' 
  echo -e "\n"
  sleep 1
done

In the output, you'll see the devworkspace attributes, but then for a brief moment, the attributes will be gone (indicated by empty newlines) and then re-appear:

{"controller.devfile.io/devworkspace-config":{"name":"devworkspace-config","namespace":"dogfooding"},"controller.devfile.io/scc":"container-build","controller.devfile.io/storage-type":"per-workspace"}

{"controller.devfile.io/devworkspace-config":{"name":"devworkspace-config","namespace":"dogfooding"},"controller.devfile.io/scc":"container-build","controller.devfile.io/storage-type":"per-workspace"}

{"controller.devfile.io/devworkspace-config":{"name":"devworkspace-config","namespace":"dogfooding"},"controller.devfile.io/scc":"container-build","controller.devfile.io/storage-type":"per-workspace"}















{"controller.devfile.io/devworkspace-config":{"name":"devworkspace-config","namespace":"dogfooding"},"controller.devfile.io/scc":"container-build","controller.devfile.io/storage-type":"per-workspace"}

@vitaliy-guliy Do you know if the restart from local devfile command is changing the devworkspace object on the cluster in multiple "steps" rather than in a single update? If so, that would probably cause this bug.

AObuchow avatar Sep 27 '24 19:09 AObuchow