amazon-genomics-cli
amazon-genomics-cli copied to clipboard
Task Stdout not written to cloud watch for MiniWDL tasks
Describe the Bug
When running a workflow using a MiniWDL context, any task that generates stdout
does not have the log contents written to a log stream in cloud watch. Instead the actual stdout
is only accessible within the workflow as a file. The stdout
that a task reports will then refer to an empty stream OR only contain the contents that were written to the stderr
of the task.
Steps to Reproduce
- Create a MiniWDL context
- From the AGC examples run the
read
workflow using either AGC or the WES API - Using the WES API (this is what was tested) retrieve the name of the log stream for the read task stdout
- Retrieve the contents of the
stdout
task
Expected Behavior
- The log stream containing the string
Hello Amazon Genomics CLI!(venv)
written to the console and reported to thestdout
log stream
Actual Behavior
- The log stream is completely empty and does not include the expected string in
stdout
.
Additional Context
- The log stream names were retrieved via the WES api directly and then were viewed in cloud watch
- When doing additional testing, I can write out to the
stderr
from a task which will show up in the stdout of the task and not the stderr log stream
Looking at the CMD, you can see that the stderr
is tee'd to the stderr.txt as well as the console (to the stdout), but the stdout is not being treated the same
"cmd": [
"/bin/bash",
"-ec",
"cd /mnt/efs/aeaebdf4-2809-4ac8-bfd6-1ee19c0f2eef/1/call-concat/work\nexit_code=0\nbash ../command >> ../stdout.txt 2> >(tee -a ../stderr.txt >&2) || exit_code=$?\nexit $exit_code"
]
Operating System: AGC Version: 1.5.0 Was AGC setup with a custom bucket: No Was AGC setup with a custom VPC: No
@mlin can you comment on what the expected behavior is?
My opinion (acknowledging that reasonable people can disagree) is that stdout should not be slurped into logs because it may be gigantic in idiomatic cases -- the WDL spec contains several examples where the full output of a command line tool is captured with output { File data = stdout() }
.
That stated, this is easier to swallow working with miniwdl locally because you can easily go peek at stdout.txt left behind on the local filesystem. If we're using AGC and remaining at arms length from EFS (consuming outputs uploaded to S3 only), then the stdout.txt's aren't readily accessible which is a problem.
A couple options are (1) provide an opt-in to shipping stdout [by changing the command Patrick showed above] or (2) changing the S3 upload wrapper to always upload stdout.txt along with the workflow & task log files.
cc @kankou-aliaksei as we've discussed this too
Having the stderr and stdout accessible as a file in s3 would be my preferred option. Streaming the results from CloudWatch
works, however there is no easy way to determine which log group a specific log belongs to at the moment.
It's generally a good idea to be able to get the stderr, stdout, and command files associated with each task in S3. I think the second option @mlin mentioned is better.
I agree, I think that would be great and would definitely solve the use case we have
@mlin, @patmagee - I'm wondering if this request can be moved to the miniwdl or miniwdl-aws-ext project as I think it will require a change there rather than in AGC. The task in AGC would be to use this modified version of miniwdl once it is available.
sounds good to me if that is the appropriate place for this