amazon-genomics-cli icon indicating copy to clipboard operation
amazon-genomics-cli copied to clipboard

Task Stdout not written to cloud watch for MiniWDL tasks

Open patmagee opened this issue 1 year ago • 8 comments

Describe the Bug When running a workflow using a MiniWDL context, any task that generates stdout does not have the log contents written to a log stream in cloud watch. Instead the actual stdout is only accessible within the workflow as a file. The stdout that a task reports will then refer to an empty stream OR only contain the contents that were written to the stderr of the task.

Steps to Reproduce

  1. Create a MiniWDL context
  2. From the AGC examples run the read workflow using either AGC or the WES API
  3. Using the WES API (this is what was tested) retrieve the name of the log stream for the read task stdout
  4. Retrieve the contents of the stdout task

Expected Behavior

  1. The log stream containing the string Hello Amazon Genomics CLI!(venv) written to the console and reported to the stdout log stream

Actual Behavior

  1. The log stream is completely empty and does not include the expected string in stdout.

Additional Context

  • The log stream names were retrieved via the WES api directly and then were viewed in cloud watch
  • When doing additional testing, I can write out to the stderr from a task which will show up in the stdout of the task and not the stderr log stream

Looking at the CMD, you can see that the stderr is tee'd to the stderr.txt as well as the console (to the stdout), but the stdout is not being treated the same

"cmd": [
	"/bin/bash",
	"-ec",
	"cd /mnt/efs/aeaebdf4-2809-4ac8-bfd6-1ee19c0f2eef/1/call-concat/work\nexit_code=0\nbash ../command >> ../stdout.txt 2> >(tee -a ../stderr.txt >&2) || exit_code=$?\nexit $exit_code"
]

Operating System: AGC Version: 1.5.0 Was AGC setup with a custom bucket: No Was AGC setup with a custom VPC: No

patmagee avatar Aug 17 '22 14:08 patmagee

@mlin can you comment on what the expected behavior is?

markjschreiber avatar Aug 31 '22 16:08 markjschreiber

My opinion (acknowledging that reasonable people can disagree) is that stdout should not be slurped into logs because it may be gigantic in idiomatic cases -- the WDL spec contains several examples where the full output of a command line tool is captured with output { File data = stdout() }.

That stated, this is easier to swallow working with miniwdl locally because you can easily go peek at stdout.txt left behind on the local filesystem. If we're using AGC and remaining at arms length from EFS (consuming outputs uploaded to S3 only), then the stdout.txt's aren't readily accessible which is a problem.

A couple options are (1) provide an opt-in to shipping stdout [by changing the command Patrick showed above] or (2) changing the S3 upload wrapper to always upload stdout.txt along with the workflow & task log files.

mlin avatar Aug 31 '22 18:08 mlin

cc @kankou-aliaksei as we've discussed this too

mlin avatar Aug 31 '22 18:08 mlin

Having the stderr and stdout accessible as a file in s3 would be my preferred option. Streaming the results from CloudWatch works, however there is no easy way to determine which log group a specific log belongs to at the moment.

patmagee avatar Sep 01 '22 11:09 patmagee

It's generally a good idea to be able to get the stderr, stdout, and command files associated with each task in S3. I think the second option @mlin mentioned is better.

kankou-aliaksei avatar Sep 01 '22 13:09 kankou-aliaksei

I agree, I think that would be great and would definitely solve the use case we have

patmagee avatar Sep 16 '22 13:09 patmagee

@mlin, @patmagee - I'm wondering if this request can be moved to the miniwdl or miniwdl-aws-ext project as I think it will require a change there rather than in AGC. The task in AGC would be to use this modified version of miniwdl once it is available.

markjschreiber avatar Sep 23 '22 14:09 markjschreiber

sounds good to me if that is the appropriate place for this

patmagee avatar Sep 23 '22 14:09 patmagee