amazon-genomics-cli
amazon-genomics-cli copied to clipboard
"results" folder not stored when using nextflow engine
Discussed in https://github.com/aws/amazon-genomics-cli/discussions/481
Originally posted by MrOlm June 13, 2022 Hello,
After a Nextflow run there is usually a "results" folder created, but that output is uploaded to AWS at the end of a nextflow amazon genomics cli run.
For example, the rnaseq workflow provided as an example (https://github.com/aws/amazon-genomics-cli/tree/main/examples/demo-nextflow-project/workflows/rnaseq) results in the following output on my end:
(base) mattolm@mac-nugget:$ aws s3 ls agc-698960807664-us-west-2/project/NextflowDemo/userid/mattolm4x9RhE/context/spotContext/nextflow-execution/
PRE logs/
PRE runs/
(base) mattolm@mac-nugget:$ aws s3 ls agc-698960807664-us-west-2/project/NextflowDemo/userid/mattolm4x9RhE/context/spotContext/nextflow-execution/runs/
PRE 23/
PRE 5c/
PRE 62/
PRE fd/
PRE stage/
PRE tmp/
2022-06-13 14:11:23 0
And when I ask for the workflow output I just get the run-id:
(base) mattolm@mac-nugget:$ agc workflow output aca7054d-3261-4c29-8cee-53efb5aed86a
2022-06-13T14:23:29-07:00 𝒊 Obtaining final outputs for workflow runId 'aca7054d-3261-4c29-8cee-53efb5aed86a'
OUTPUT id aca7054d-3261-4c29-8cee-53efb5aed86a
However, according to the actual repo (https://github.com/nextflow-io/rnaseq-nf), there should be a folder and file created called results/multiqc_report.html
. When I run this same command locally, as below, the results folder is indeed created:
(base) mattolm@mac-nugget:$ nextflow run https://github.com/nextflow-io/rnaseq-nf.git -resume -params-file ../workflows/rnaseq/inputs.json
N E X T F L O W ~ version 21.10.6
Launching `nextflow-io/rnaseq-nf` [tiny_marconi] - revision: 37c5039435 [master]
R N A S E Q - N F P I P E L I N E
===================================
transcriptome: /home/mattolm/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
reads : s3://1000genomes/phase3/data/HG00243/sequence_read/SRR*_{1,2}.filt.fastq.gz
outdir : results
Uploading local `bin` scripts folder to s3://agc-698960807664-us-west-2/project/NextflowDemo/userid/mattolm4x9RhE/context/spotContext/nextflow-execution/runs/tmp/59/64a5bb5c2c0bfb58a1bb7b8ec38a4e/bin
executor > awsbatch (4)
[46/644f98] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
[cc/41410f] process > RNASEQ:FASTQC (FASTQC on SRR099964) [100%] 1 of 1 ✔
[63/f9c8dd] process > RNASEQ:QUANT (SRR099964) [100%] 1 of 1 ✔
[da/a87450] process > MULTIQC [100%] 1 of 1 ✔
Done! Open the following report in your browser --> results/multiqc_report.html
Completed at: 13-Jun-2022 15:22:30
Duration : 26m 42s
CPU hours : 0.7
Succeeded : 4
(base) mattolm@mac-nugget:$ ls
nextflow.config results work
(base) mattolm@mac-nugget:$ ls results/
fastqc_SRR099964_logs multiqc_report.html
It's not a big deal for a small workflow like this, but when I ran a larger nf-core workflow (https://nf-co.re/mag?q=mag) there is supposed to be a beautiful "results" folder created, but I only the messy "runs" folder above is uploaded at the end.
Thank you in advance for your help, MO
This is prohibiting regular/routine use of AGC for us as well. Publishing outputs in a specific directory to rename and organise results is a key feature for many NF pipelines and we'd love to see this handled by AGC.