Failed deployments show the head of the groovy script output, not tail
I know that design limits how much data we can get from the script's output and I'm fine with that limitation. The problem however is that when a deployment fails the output we see is not relevant and usually misleading. Getting the tail of the output, where the 'error' occurred would be much more valuable. This could be implemented as an option/flag to keep the current behavior.
Our current workaround it to 'tee' the deployment output to a log file and add links to the agent's view, but it still requires a few clicks.
I am not sure I understand the "issue". Can you please give an example?
When we deploy/redeploy and there is an error, the deployment view shows the head of the Groovy script output: Run [install] phase for [/ed/celery] on [i-03dc8530] - 2s
- script=IgGluScript [/ed/celery], action=install
- [org.linkedin.glu.agent.api.ShellExecException]: Error while executing command wget -O /mnt/ig/ed/celery/glu.py https://s3-us-west-2.amazonaws.com/glu.py 2>&1 | tee -a /mnt/ig/ed/celery/glu.log ; [[ ${PIPESTATUS[0]} == 0 ]]: res=1 - output=--2012-10-22 11:53:44-- https://s3-us-west-2.amazonaws.com/glu.py Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 205.251.235.151 Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|205.251.235.151|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2012-10-22 11:53:44 ERROR 404: Not Found. - error=
This helps but usually the output is much longer than the allowed buffer (for good reasons) and we would much rather have the tail of this since usually the errors show up at the end. This is a 'bad' example above since I made it fail early, often times the output in the view does not have any relevant info on the failure.
What happens if you click through to the agent page and select the 'view error' link? This is supposed to make a call to the agent to fetch the full error... Does this output contains enough data?
No, when I click on 'View Full Stack Trace', I get the full 'java' stack trace indeed, but the same stripped stdout/stderr (I counted around 470 characters). It is a little better here because the carriage returns are displayed, so it is easier to read in that matter, but it is mixed with 10 times more lines of stack traces, that are not related to the actual issue. We are still missing the actual tail of stdout/stderr. Our workaround is to 'tee' the output to a log file which has all the lines, but is an ephemeral file by design (not recorded in the archived logs) and requires to click multiple times to get to it.
I do understand that it is probably not what you are looking for, but you can have any variable in your script which ends with Log and it will be displayed as a shortcut in the UI. Example:
def MyGluScript = {
def xxxLog
def instal = {
xxxLog = ....
}
}
The in the UI you will see a button for viewing (tail) xxx log file
Also from an automated point of view (this is missing from the console REST api) there should be a way to tail any log file like you can already do in the ui (by clicking on the button I was talking about) although in the meantime you can easily build the UI link automatically (by using the live model if you use the xxxLog field parameter you can build the link that will tail the log file)
Yes, we are doing the custom log links at the moment, so that helps a lot but it still requires two clicks while we are looking to have that inlined into the deployment view so that we can see why things are failing at a glance. This is especially useful if multiple components/agents are failing since it requires us to do two click per failed entry with our current implementation.
I plan on updating our groovy script to wrap our shell calls with a custom function that will follow a non zero exit code with a 'tail -c 400 glu.log' (show the last 400 chars of the log file) as a workaround. This won't be perfect, that is why I filed this feature request since I think the tail of the output is more useful than the head on errors.
I am doing heavy work on stdout/stderr handling for commands... once I am done I will take a look to retrofit some of this work for scripts