Users are unable to debug why a workspace is failing to start due to lack of accessible logs.
Describe the bug
When there is a bug in postStart command, that causes a workspace to fail to start, it impossible to retain the errors causing the workspace to help user troubleshoot/understand the root cause of the failure
Che version
7.95
Steps to reproduce
- Create an empty workspace.
- Create a /projects/devfile.yaml with the following content:
schemaVersion: 2.2.0
metadata:
name: failing-post-start-ws
components:
- container:
image: nexus-docker.enterprise-tools.aws.delta.com/com/delta/dx/udi:latest
sourceMapping: /projects
name: tools
commands:
- id: failing-command
exec:
commandLine: "echo 'I fail' && exit 1"
component: tools
events:
postStart:
- failing-command
-
Run "Restart from local devfile" action and select the devfile.yaml created in the above step.
-
the workspace restats fails with the following error:
Failed to open the workspace
Error creating DevWorkspace deployment: Detected unrecoverable event FailedPostStartHook: PostStartHook failed.
examining the logs on the main container "tools", there is no information that helps the developer understand why the workspace is failing.
The workspace deployment YAML shows the postStart command as follows. It is noted that stdout and stderr are being redirected to /tmp/poststart-stderr.txt, making it impossible to retain the errors causing the workspace to fail to start.
lifecycle:
postStart:
exec:
command:
- /bin/sh
- '-c'
- |
{
echo 'I fail' && exit 1
nohup /checode/entrypoint-volume.sh > /checode/entrypoint-logs.txt 2>&1 &
} 1>/tmp/poststart-stdout.txt 2>/tmp/poststart-stderr.txt
name: tools
Expected behavior
Developers should be able to view the stdout and stderr of the postStart command that is failing. This would allow users to see the "I fail" output from the command above, facilitating easier debugging and resolution of the issue.
Runtime
OpenShift
Screenshots
No response
Installation method
OperatorHub
Environment
Linux, macOS
Eclipse Che Logs
Additional context
No response
IIRC this is unfortunately not trivial to do due to recent changes to Kuberentes, unfortunately. See https://github.com/devfile/devworkspace-operator/issues/1324
However, I had an idea to force postStart events to always succeed, though this could lead to weird behaviour where the workspace starts up in an invalid/unexpected state. Something like:
lifecycle:
postStart:
exec:
command:
- /bin/sh
- '-c'
- |
{
command1
command2
command3
} 1> /tmp/poststart-stdout.txt 2> /tmp/poststart-stderr.txt || true # Force postStart event to succeed with || true
IMO if this route were taken, we'd want the editor to display a notification to the user that some postStart events failed.
I see in https://github.com/devfile/devworkspace-operator/pull/425 controller.devfile.io/debug-start annotation was added to aid in debugging failed devworkspaces:
metadata:
annotations:
controller.devfile.io/debug-start: "true"
When this is enabled, the Pod for the failed DevWorkspace will not be terminated immediately. Allowing time to debug and check oc describe pod output.
This seems to help for scenarios when main component container commands fail, but not for postStart hook commands.
Closed via https://github.com/devfile/devworkspace-operator/pull/1522