rnaseq
rnaseq copied to clipboard
Execution time does not change even if some QC tools are skipped
Description of the bug
I have been testing the execution of the pipeline on an HPC. Despite performing the execution with 32 threads, the time (for 1 fastq pair-end sample) was going up to 3h 15min approximately. To minimize this time (since we have to process a total of 350 samples) I skipped the execution of some quality control tools (preseq, biotype_qc, dupradar, stringtie and deseq2_qc) and the pseudo-alignment with salmon. Even so, the execution time hardly changed. This raises several questions for me:
- Any idea what this might be due to?
- Is it not possible to go below the 3h execution time for this pipeline?
- What are the most demanding processes/tools?
Thanks in advance!
Command used and terminal output
#### COMMAND USED ####
nextflow run nf-core/rnaseq -r 3.8.1 --input ${SAMPLESHEET} -profile singularity --max_cpus 32 --genome GRCh38 --fasta ${REFERENCE_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --gtf ${REFERENCE_DIR}/gencode.v41.annotation.gtf.gz --gencode --star_index ${STAR_INDEX} --aligner star_salmon --skip_preseq --skip_biotype_qc --skip_dupradar --skip_stringtie --skip_deseq2_qc --outdir $OUTDIR
#### OUTPUT ####
nf-core/rnaseq v3.8.1
------------------------------------------------------
Core Nextflow options
revision : 3.8.1
runName : pedantic_brown
containerEngine: singularity
launchDir : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/1_src
workDir : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/1_src/work
projectDir : /home/usc/mg/mfp/.nextflow/assets/nf-core/rnaseq
userName : uscmgmfp
profile : singularity
configFiles : /home/usc/mg/mfp/.nextflow/assets/nf-core/rnaseq/nextflow.config
Input/output options
input : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/1_src/samplesheet_rna.csv
outdir : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/3_results/1_URONCOLOGY-RNA-PE
email : [email protected]
Reference genome options
genome : GRCh38
fasta : /mnt/netapp2/translational_oncology/0_reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
gtf : /mnt/netapp2/translational_oncology/0_reference/gencode.v41.annotation.gtf.gz
gene_bed : s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed
star_index : /mnt/netapp2/translational_oncology/1_tools/3_nf_core_rnaseq/2_data/1_star_reference
gencode : true
Process skipping options
skip_stringtie : true
skip_preseq : true
skip_dupradar : true
skip_biotype_qc: true
skip_deseq2_qc : true
Max job request options
max_cpus : 32
------------------------------------------------------
Completed at: 22-Sep-2022 11:59:44
Duration : 3h 26m 39s
CPU hours : 34.4
Succeeded : 41
Relevant files
No response
System information
- Nextflow Version 22.04.5
- Executed in Finisterrae III HPC with slurm and Linux
- Singularity container
- Version of nf-core/rnaseq 3.8.1
Hi @mimifp ! Are you able to send the .nextflow.log files for the runs before and after you used all of the skip steps?
There should also be some files in results/pipeline_info/execution_* that will give you an idea as to how long each process took and an overall summary.
Be interested to know what these look like when compared before and after.
Hi @drpatelh, first of all thanks for answer and sorry for delay! I ran the pipeline again with all and some tools and the times have varied a little but not much:
All tools Completed at: 03-Oct-2022 12:19:04 Duration : 3h 9m 7s CPU hours : 35.3 Succeeded : 47
Some tools Completed at: 30-Sep-2022 17:41:20 Duration : 2h 53m 21s CPU hours : 29.8 Succeeded : 36
Regarding to execution info files, I attach them for both pipelines.
Thanks!
When I run -profile test and skip most of the tools, I reduce the running time by around ~33% on AWS Batch. I think I'm going to close this issue but feel free to reopen it if you have any more questions.
We could add a 'streamlined' profile which only runs the fastest tools for a minimum turnaround time.