amazon-genomics-cli
amazon-genomics-cli copied to clipboard
Support concurrent workflows with different manifests
As described here, I’d like AGC to support running multiple workflows in the same context:
…the MANIFEST.json with a per-workflow inputFileURLs specified in the MANIFEST.json is packaged with the workflow (workflow.zip). The workflow.zip is 1:1 with the context, and not with the agc workflow run. This is certainly unexpected, as I'd expect that agc workflow run would not overwrite any previous definition of the workflow. So I think this is a bigger issue than just convenience.
The issue is that the inputs can be specified in the Manifest so if we modify the Manifest for a second set of inputs the first workflow zip file gets overwritten.
How is the workflow zip file used? Does AGC only use it at the beginning of the workflow execution or does it access it throughout its lifetime? I assume you know/suspect it's the latter @nh13?
The workflow.zip is only used as part of the submission. Engine adapters unpack the zip file and submit the contents to the engine. The zip is otherwise, not referenced while the workflow is running.
There is still a rare chance for a race condition if the same workflow is submitted at exactly the same time from two different environments.
I have hit this race condition submitting a workflow per sample in succession.
Thank you for the clarification @wleepang. I confirm that I have been able to successfully run the same workflow for four samples concurrently by leaving ~5 mins between agc workflow run
executions.
This may not be acceptable for @nh13's use case, though.
I have run into this when submitting runs in parallel. I think a very simple random string appended to the end of the workflow.zip
filename would fix this entirely
@wleepang looking into the cromwell adapter, it seems like there is a hardcoded expectation that the submitted zip file is called workflow.zip
otherwise, it simply passes the url through to cromwell as the workflowUrl
. If this check simply looked for *.zip
files agc could append any random string to the name of the file to ensure there is no race condition. Ie workflow-166196782808.zip