che icon indicating copy to clipboard operation
che copied to clipboard

Users are unable to debug why a workspace is failing to start due to lack of accessible logs.

Open achdmbp opened this issue 8 months ago • 1 comments

Describe the bug

When there is a bug in postStart command, that causes a workspace to fail to start, it impossible to retain the errors causing the workspace to help user troubleshoot/understand the root cause of the failure

Che version

7.95

Steps to reproduce

  1. Create an empty workspace.
  2. Create a /projects/devfile.yaml with the following content:
schemaVersion: 2.2.0
metadata:
  name: failing-post-start-ws
components:
  - container:
      image: nexus-docker.enterprise-tools.aws.delta.com/com/delta/dx/udi:latest
      sourceMapping: /projects
    name: tools
commands:
  - id: failing-command
    exec:
      commandLine: "echo 'I fail' && exit 1"
      component: tools
events:
  postStart:
    - failing-command
  1. Run "Restart from local devfile" action and select the devfile.yaml created in the above step.

  2. the workspace restats fails with the following error:

Failed to open the workspace
Error creating DevWorkspace deployment: Detected unrecoverable event FailedPostStartHook: PostStartHook failed.

examining the logs on the main container "tools", there is no information that helps the developer understand why the workspace is failing.

The workspace deployment YAML shows the postStart command as follows. It is noted that stdout and stderr are being redirected to /tmp/poststart-stderr.txt, making it impossible to retain the errors causing the workspace to fail to start.

          lifecycle:
            postStart:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - |
                    {
                    echo 'I fail' && exit 1
                    nohup /checode/entrypoint-volume.sh > /checode/entrypoint-logs.txt 2>&1 &
                    } 1>/tmp/poststart-stdout.txt 2>/tmp/poststart-stderr.txt
          name: tools

Expected behavior

Developers should be able to view the stdout and stderr of the postStart command that is failing. This would allow users to see the "I fail" output from the command above, facilitating easier debugging and resolution of the issue.

Runtime

OpenShift

Screenshots

No response

Installation method

OperatorHub

Environment

Linux, macOS

Eclipse Che Logs


Additional context

No response

achdmbp avatar Apr 08 '25 16:04 achdmbp

IIRC this is unfortunately not trivial to do due to recent changes to Kuberentes, unfortunately. See https://github.com/devfile/devworkspace-operator/issues/1324

However, I had an idea to force postStart events to always succeed, though this could lead to weird behaviour where the workspace starts up in an invalid/unexpected state. Something like:

      lifecycle:
        postStart:
          exec:
            command:
              - /bin/sh
              - '-c'
              - |
                {
                  command1
                  command2
                  command3
                } 1> /tmp/poststart-stdout.txt 2> /tmp/poststart-stderr.txt || true # Force postStart event to succeed with || true

IMO if this route were taken, we'd want the editor to display a notification to the user that some postStart events failed.

AObuchow avatar Apr 08 '25 22:04 AObuchow

I see in https://github.com/devfile/devworkspace-operator/pull/425 controller.devfile.io/debug-start annotation was added to aid in debugging failed devworkspaces:

Debugging a failing workspace

metadata:
  annotations:
    controller.devfile.io/debug-start: "true"

When this is enabled, the Pod for the failed DevWorkspace will not be terminated immediately. Allowing time to debug and check oc describe pod output.

This seems to help for scenarios when main component container commands fail, but not for postStart hook commands.

rohanKanojia avatar Sep 23 '25 11:09 rohanKanojia

Closed via https://github.com/devfile/devworkspace-operator/pull/1522

rohanKanojia avatar Nov 12 '25 10:11 rohanKanojia