rnaseq
rnaseq copied to clipboard
Unable to go from FASTQ to Salmon Quantification
Description of the bug
Dear Researchers and Developers,
Thank you for developing this pipeline.
I am trying to go from FASTQ files to Salmon pseudo-alignment and quantification, as per the flow chart (Phase 1 and 3 only): https://raw.githubusercontent.com/nf-core/rnaseq/3.14.0//docs/images/nf-core-rnaseq_metro_map_grey.png
Specifically, trying to achieve:
- Infer strandedness
- FastQC
- FastP/TrimGalore
- FastQC
- SortMeRNA
- Salmon (pseudo-alignment and quantification)
- MultiQC on FastQC output
Issue 1:
Despite supplying a pre-built decoy-aware Salmon index for transcripts, both Genome fasta and GTF files are still needed. It is not clear why this is needed.
Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.
No GTF or GFF3 annotation specified! The pipeline requires at least one of these files.
Issue 2:
The fq
subsample step is run, not sure if this is necessary for Salmon to infer strandedness.
Issue 3:
At some point in the pipeline, there is a failure due to an RSEM error. It is not clear why RSEM is being called for the Reference Genome, when it is not part of Steps 1 and 3.
process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/GRCh38.primary_assembly.genome.fa) [ 0%] 0 of 1
Issue 4:
The pipeline does not stop at Salmon quantification and tries to continue to unexpected next steps.
[78/cab4d8] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT (ERR2179089) [100%] 1 of 1 ✔
[78/c6d1b3] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (gencode.v46.primary_assembly.annotation.gtf) [100%] 1 of 1 ✔
[8d/caed0e] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TXIMPORT [100%] 1 of 1, failed: 1 ✘
It would be very helpful to know what switches need to toggled to only execute Steps 1–7. Thank you for your consideration.
- Infer strandedness
- FastQC
- FastP/TrimGalore
- FastQC
- SortMeRNA
- Salmon (pseudo-alignment and quantification)
- MultiQC on FastQC output
Command used and terminal output
nextflow run nf-core/rnaseq \
--input samplesheet.csv \
--outdir ~/bioinformatics/output/salmon/ \
--fasta ~/bioinformatics/references/salmon_hs/GRCh38.primary_assembly.genome.fa.gz \
--gtf ~/bioinformatics/references/salmon_hs/gencode.v46.primary_assembly.annotation.gtf.gz \
--gencode \
--trimmer fastp \
--salmon_index ~/bioinformatics/references/salmon_hs/index/ \
--pseudo_aligner salmon \
--skip_gtf_filter \
--skip_gtf_transc \
--skip_umi_extract \
--skip_bbsplit \
--skip_alignment \
--skip_markduplic \
--skip_bigwig \
--skip_stringtie \
--skip_preseq \
--skip_dupradar \
--skip_qualimap \
--skip_rseqc \
--skip_biotype_qc \
--skip_deseq2_qc \
--skip_multiqc \
--max_memory 100.GB \
--max_cpus 24 \
System information
Nextflow: 24.04.2 Hardware: Desktop Executor: local Container: conda OS: Ubuntu 22.04.4 LTS nf-core/rnaseq v3.14.0-gb89fac3