amazon-genomics-cli icon indicating copy to clipboard operation
amazon-genomics-cli copied to clipboard

"results" folder not stored when using nextflow engine

Open MrOlm opened this issue 2 years ago • 1 comments

Discussed in https://github.com/aws/amazon-genomics-cli/discussions/481

Originally posted by MrOlm June 13, 2022 Hello,

After a Nextflow run there is usually a "results" folder created, but that output is uploaded to AWS at the end of a nextflow amazon genomics cli run.

For example, the rnaseq workflow provided as an example (https://github.com/aws/amazon-genomics-cli/tree/main/examples/demo-nextflow-project/workflows/rnaseq) results in the following output on my end:

(base) mattolm@mac-nugget:$ aws s3 ls agc-698960807664-us-west-2/project/NextflowDemo/userid/mattolm4x9RhE/context/spotContext/nextflow-execution/
                           PRE logs/
                           PRE runs/


(base) mattolm@mac-nugget:$ aws s3 ls agc-698960807664-us-west-2/project/NextflowDemo/userid/mattolm4x9RhE/context/spotContext/nextflow-execution/runs/
                           PRE 23/
                           PRE 5c/
                           PRE 62/
                           PRE fd/
                           PRE stage/
                           PRE tmp/
2022-06-13 14:11:23          0

And when I ask for the workflow output I just get the run-id:

(base) mattolm@mac-nugget:$ agc workflow output aca7054d-3261-4c29-8cee-53efb5aed86a
2022-06-13T14:23:29-07:00 𝒊  Obtaining final outputs for workflow runId 'aca7054d-3261-4c29-8cee-53efb5aed86a'

OUTPUT  id      aca7054d-3261-4c29-8cee-53efb5aed86a 

However, according to the actual repo (https://github.com/nextflow-io/rnaseq-nf), there should be a folder and file created called results/multiqc_report.html. When I run this same command locally, as below, the results folder is indeed created:

(base) mattolm@mac-nugget:$ nextflow run https://github.com/nextflow-io/rnaseq-nf.git -resume -params-file ../workflows/rnaseq/inputs.json
N E X T F L O W  ~  version 21.10.6
Launching `nextflow-io/rnaseq-nf` [tiny_marconi] - revision: 37c5039435 [master]
 R N A S E Q - N F   P I P E L I N E
 ===================================
 transcriptome: /home/mattolm/.nextflow/assets/nextflow-io/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa
 reads        : s3://1000genomes/phase3/data/HG00243/sequence_read/SRR*_{1,2}.filt.fastq.gz
 outdir       : results

Uploading local `bin` scripts folder to s3://agc-698960807664-us-west-2/project/NextflowDemo/userid/mattolm4x9RhE/context/spotContext/nextflow-execution/runs/tmp/59/64a5bb5c2c0bfb58a1bb7b8ec38a4e/bin
executor >  awsbatch (4)
[46/644f98] process > RNASEQ:INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
[cc/41410f] process > RNASEQ:FASTQC (FASTQC on SRR099964)     [100%] 1 of 1 ✔
[63/f9c8dd] process > RNASEQ:QUANT (SRR099964)                [100%] 1 of 1 ✔
[da/a87450] process > MULTIQC                                 [100%] 1 of 1 ✔

Done! Open the following report in your browser --> results/multiqc_report.html

Completed at: 13-Jun-2022 15:22:30
Duration    : 26m 42s
CPU hours   : 0.7
Succeeded   : 4

(base) mattolm@mac-nugget:$ ls
nextflow.config  results  work

(base) mattolm@mac-nugget:$ ls results/
fastqc_SRR099964_logs  multiqc_report.html

It's not a big deal for a small workflow like this, but when I ran a larger nf-core workflow (https://nf-co.re/mag?q=mag) there is supposed to be a beautiful "results" folder created, but I only the messy "runs" folder above is uploaded at the end.

Thank you in advance for your help, MO

MrOlm avatar Jun 13 '22 22:06 MrOlm

This is prohibiting regular/routine use of AGC for us as well. Publishing outputs in a specific directory to rename and organise results is a key feature for many NF pipelines and we'd love to see this handled by AGC.

scwatts avatar Oct 21 '22 01:10 scwatts