rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

Unable to go from FASTQ to Salmon Quantification

Open SuhasSrinivasan opened this issue 7 months ago • 0 comments

Description of the bug

Dear Researchers and Developers,

Thank you for developing this pipeline.

I am trying to go from FASTQ files to Salmon pseudo-alignment and quantification, as per the flow chart (Phase 1 and 3 only): https://raw.githubusercontent.com/nf-core/rnaseq/3.14.0//docs/images/nf-core-rnaseq_metro_map_grey.png

Specifically, trying to achieve:

  1. Infer strandedness
  2. FastQC
  3. FastP/TrimGalore
  4. FastQC
  5. SortMeRNA
  6. Salmon (pseudo-alignment and quantification)
  7. MultiQC on FastQC output

Issue 1:

Despite supplying a pre-built decoy-aware Salmon index for transcripts, both Genome fasta and GTF files are still needed. It is not clear why this is needed.

Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.
No GTF or GFF3 annotation specified! The pipeline requires at least one of these files.

Issue 2:

The fq subsample step is run, not sure if this is necessary for Salmon to infer strandedness.

Issue 3:

At some point in the pipeline, there is a failure due to an RSEM error. It is not clear why RSEM is being called for the Reference Genome, when it is not part of Steps 1 and 3.

process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/GRCh38.primary_assembly.genome.fa) [  0%] 0 of 1

Issue 4:

The pipeline does not stop at Salmon quantification and tries to continue to unexpected next steps.

[78/cab4d8] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT (ERR2179089)                             [100%] 1 of 1 ✔
[78/c6d1b3] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (gencode.v46.primary_assembly.annotation.gtf) [100%] 1 of 1 ✔
[8d/caed0e] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TXIMPORT                                              [100%] 1 of 1, failed: 1 ✘

It would be very helpful to know what switches need to toggled to only execute Steps 1–7. Thank you for your consideration.

  1. Infer strandedness
  2. FastQC
  3. FastP/TrimGalore
  4. FastQC
  5. SortMeRNA
  6. Salmon (pseudo-alignment and quantification)
  7. MultiQC on FastQC output

Command used and terminal output

nextflow run nf-core/rnaseq \
    --input samplesheet.csv \
    --outdir ~/bioinformatics/output/salmon/ \
    --fasta ~/bioinformatics/references/salmon_hs/GRCh38.primary_assembly.genome.fa.gz \
    --gtf ~/bioinformatics/references/salmon_hs/gencode.v46.primary_assembly.annotation.gtf.gz \
    --gencode \
    --trimmer fastp \
    --salmon_index ~/bioinformatics/references/salmon_hs/index/ \
    --pseudo_aligner salmon \
    --skip_gtf_filter \
    --skip_gtf_transc \
    --skip_umi_extract \
    --skip_bbsplit \
    --skip_alignment \
    --skip_markduplic \
    --skip_bigwig \
    --skip_stringtie \
    --skip_preseq \
    --skip_dupradar \
    --skip_qualimap \
    --skip_rseqc \
    --skip_biotype_qc \
    --skip_deseq2_qc \
    --skip_multiqc \
    --max_memory 100.GB \
    --max_cpus 24 \

System information

Nextflow: 24.04.2 Hardware: Desktop Executor: local Container: conda OS: Ubuntu 22.04.4 LTS nf-core/rnaseq v3.14.0-gb89fac3

SuhasSrinivasan avatar Jul 02 '24 07:07 SuhasSrinivasan