rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

SALMON_TX2GENE failing when skip_alignment is true and pseudo_aligner is salmon

Open JSchoenbachler opened this issue 3 years ago • 9 comments

Check Documentation

I have checked the following places for your error:

Description of the bug

When using the flags to skip alignment and use salmon as a pseudo aligner (--skip_alignment --pseudo_aligner salmon) I get the following error:

Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE (genes.gtf)'

Caused by:
  Missing output file(s) `*.tsv` expected by process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE (genes.gtf)`

Command executed:

  salmon_tx2gene.py \
      --gtf genes.gtf \
      --salmon salmon \
      --id gene_id \
      --extra gene_name \
      -o salmon_tx2gene.tsv
  
  cat <<-END_VERSIONS > versions.yml
  SALMON_TX2GENE:
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

However, simply removing those flags makes the command execute successfully. I have tested this using data for mouse and human genomes.

Steps to reproduce

  1. Create the following as ids.txt:
SRR10877044
SRR10877043
  1. Run the fetchngs command to get the samplesheet:
nextflow run nf-core/fetchngs --input ids.txt --max_memory 60.GB --max_cpus 12 -r 1.3
  1. Modify samplesheet to remove extra columns and add strandedness:
sample,fastq_1,fastq_2,strandedness
SRX7546616,./results/fastq/SRX7546616_T1_1.fastq.gz,./results/fastq/SRX7546616_T1_2.fastq.gz,reverse
SRX7546617,./results/fastq/SRX7546617_T1_1.fastq.gz,./results/fastq/SRX7546617_T1_2.fastq.gz,reverse
  1. Run the rnaseq command:
nextflow run nf-core/rnaseq --input samplesheet.csv --genome GRCm38 --max_memory 60.GB --max_cpus 12 --outdir results2 --skip_markduplicates --skip_bigwig --skip_stringtie --skip_preseq --skip_dupradar --skip_qualimap --skip_rseqc --skip_biotype_qc --skip_deseq2_qc --salmon_index "/home/ubuntu/genomes/alias/mm10/salmon_partial_sa_index/default" --skip_alignment --pseudo_aligner salmon -profile docker -r 3.4

Again, the above is on a mouse genome but also tested with a human genome the same behavior occurs.

Expected behaviour

Should succeed and actually process faster since it would ideally be simply skipping the alignment step.

Log files

Have you provided the following extra information/files:

  • [X] The command used to run the pipeline
  • [ ] The .nextflow.log file *Have not provided the log file, but can if requested.

System

  • Hardware: Amazon EC2 m5.4xlarge (but also happens on smaller instances)
  • OS: Ubuntu
  • Version 20.04

Nextflow Installation

  • Version: 21.04.0

Container engine

  • Engine: Docker
  • version: 20.10.8

JSchoenbachler avatar Nov 19 '21 16:11 JSchoenbachler

Hi @JSchoenbachler ! Hope you are well and apologies for the delay. I have tried to reproduce the error you reported but haven't had much luck. Be great if you can try to reproduce using the following minimal test example please:

nextflow pull nf-core/rnaseq 

nextflow run nf-core/rnaseq \
    --input https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.4/samplesheet_test.csv \
    --fasta https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa \
    --gtf https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf.gz \
    --skip_alignment \
    --pseudo_aligner salmon \
    -r dev \
    -profile singularity

Quite alot has changed on the dev branch of late as we have overhauled the default NF syntax in the pipeline. I am quietly hoping this has been fixed 🤞🏽

drpatelh avatar Dec 13 '21 14:12 drpatelh

Hey @drpatelh , thanks for getting back to me!

I tried the command you provided above on the test samplesheet (only thing I did differently was to change profile to docker since that's how I'm running it) and it worked.

However, when I tried using -r dev on my command I originally provided above, I got the same error result. Did you try with my command and genome?

JSchoenbachler avatar Dec 13 '21 20:12 JSchoenbachler

Thanks for giving it a go @JSchoenbachler. I tried to replicate the command you used with a test dataset but if you are still getting the error then I will try with the full-sized data. Will report back. Thanks!

drpatelh avatar Dec 14 '21 09:12 drpatelh

@drpatelh okay thanks! I think everything I provided should be sufficient in debugging this, but if it isn't let me know!

JSchoenbachler avatar Dec 14 '21 16:12 JSchoenbachler

Unable to replicate @JSchoenbachler 😏

I tried on a full-sized dataset on Nextflow Tower using the options below and it worked: --genome GRCh37 --skip_alignment --pseudo_aligner salmon -r dev

image

I also tried to reproduce the command to run locally using the smaller test data with the command below and that worked too:

nextflow run main.nf \
    --genome 'R64-1-1' \
    --input https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.4/samplesheet_test.csv \
    --skip_alignment \
    --pseudo_aligner salmon \
    -profile singularity

Be great if you are able to isolate the problem a little more or try and generate a smaller test we can use to reproduce. Will keep this open for now and bump to the next release milestone.

Thanks!

drpatelh avatar Dec 16 '21 20:12 drpatelh

@drpatelh That's strange. If you don't mind could you follow my steps in the "Steps to reproduce" section, including the data I use as a source? I provided the contents of my ids.txt and samplesheet.csv, so you should be able to take that and run it.

JSchoenbachler avatar Dec 16 '21 20:12 JSchoenbachler

Are you able to send me the .command.out and .command.err files in the work directory for the failing process? (and the .nextflow.log if you have it handy)? If it's sample related then there should be some sort of indication there.

drpatelh avatar Dec 16 '21 20:12 drpatelh

Here are my files (since GitHub won't let me attach files with unsupported extensions): https://drive.google.com/drive/folders/1Z8Bu2FTPpu-dOS-Igmd3GKwYyElkZJyF?usp=sharing

JSchoenbachler avatar Dec 17 '21 16:12 JSchoenbachler

Awesome. Thanks I have uploaded them here by adding a .txt extension. Sorry, didn't get time to do any more tests but I have just released v3.5. Will keep this open for the next release. Have a great Xmas and Happy New Year!

command.err.txt nextflow.log.txt

drpatelh avatar Dec 17 '21 17:12 drpatelh

Unfortunately, I was unable to replicate this error with the ids.txt provided as input to nf-core/fetchngs and subsequently nf-core/rnaseq with the parameters provided in "Steps to reproduce".

Summary of what was tested:

  1. Run nf-core/fetchngs with the provided IDs as input --input ids.txt on Nextflow Tower with -r 1.10.0
  2. Run nf-core/rnaseq with the samplesheet.csv generated from fetchngs and the following parameters to follow exactly what was provided in steps to reproduce:
    --genome GRCm38 \
    --skip_markduplicates \
    --skip_bigwig \
    --skip_stringtie \
    --skip_preseq \
    --skip_dupradar \
    --skip_qualimap \
    --skip_rseqc \
    --skip_biotype_qc \
    --skip_deseq2_qc \
    --skip_alignment \
    --pseudo_aligner salmon 
  1. Pipeline completes successfully.
[ed/172522] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF_GENE_FILTER (genome.fa)
[79/905ada] Submitted process > NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)
[33/0392d6] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CUSTOM_GETCHROMSIZES (genome.fa)
[fb/d05bb7] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/genome.fa)
[12/fcdcee] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (SRX7546617)
[bf/3a9904] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (SRX7546616)
[c2/6eb078] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:SALMON_INDEX (genome.transcripts.fa)
[9e/33807b] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (SRX7546616)
[eb/dd92bb] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (SRX7546617)
[99/4b5019] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE (SRX7546616)
[14/abc538] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRX7546616)
[ef/a7acbb] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRX7546617)
[50/16fc98] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE (SRX7546617)
[ab/b2123f] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_QUANT (SRX7546616)
[a8/e9ee57] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_QUANT (SRX7546617)
[c2/0649da] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE (genes.gtf)
[65/0d2913] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TXIMPORT
[24/9d70fb] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE (salmon_tx2gene.tsv)
[fb/8e55c4] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE_SCALED (salmon_tx2gene.tsv)
[a9/e31955] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE_LENGTH_SCALED (salmon_tx2gene.tsv)
[12/549738] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_TRANSCRIPT (salmon_tx2gene.tsv)
[be/1964fc] Submitted process > NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)
[4d/2b1c4c] Submitted process > NFCORE_RNASEQ:RNASEQ:MULTIQC (1)
Waiting for file transfers to complete (1 files)
-[nf-core/rnaseq] Pipeline completed successfully -

Might have been resolved in earlier versions of the module or pipeline.

ejseqera avatar Jun 01 '23 14:06 ejseqera

Thanks @ejseqera ! Given that this has been tested in the latest release I will close this issue. Please feel free to reopen if the issue persists.

drpatelh avatar Jun 01 '23 16:06 drpatelh