rnaseq
rnaseq copied to clipboard
SALMON_TX2GENE failing when skip_alignment is true and pseudo_aligner is salmon
Check Documentation
I have checked the following places for your error:
Description of the bug
When using the flags to skip alignment and use salmon as a pseudo aligner (--skip_alignment --pseudo_aligner salmon) I get the following error:
Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE (genes.gtf)'
Caused by:
Missing output file(s) `*.tsv` expected by process `NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE (genes.gtf)`
Command executed:
salmon_tx2gene.py \
--gtf genes.gtf \
--salmon salmon \
--id gene_id \
--extra gene_name \
-o salmon_tx2gene.tsv
cat <<-END_VERSIONS > versions.yml
SALMON_TX2GENE:
python: $(python --version | sed 's/Python //g')
END_VERSIONS
However, simply removing those flags makes the command execute successfully. I have tested this using data for mouse and human genomes.
Steps to reproduce
- Create the following as
ids.txt:
SRR10877044
SRR10877043
- Run the fetchngs command to get the samplesheet:
nextflow run nf-core/fetchngs --input ids.txt --max_memory 60.GB --max_cpus 12 -r 1.3
- Modify samplesheet to remove extra columns and add strandedness:
sample,fastq_1,fastq_2,strandedness
SRX7546616,./results/fastq/SRX7546616_T1_1.fastq.gz,./results/fastq/SRX7546616_T1_2.fastq.gz,reverse
SRX7546617,./results/fastq/SRX7546617_T1_1.fastq.gz,./results/fastq/SRX7546617_T1_2.fastq.gz,reverse
- Run the rnaseq command:
nextflow run nf-core/rnaseq --input samplesheet.csv --genome GRCm38 --max_memory 60.GB --max_cpus 12 --outdir results2 --skip_markduplicates --skip_bigwig --skip_stringtie --skip_preseq --skip_dupradar --skip_qualimap --skip_rseqc --skip_biotype_qc --skip_deseq2_qc --salmon_index "/home/ubuntu/genomes/alias/mm10/salmon_partial_sa_index/default" --skip_alignment --pseudo_aligner salmon -profile docker -r 3.4
Again, the above is on a mouse genome but also tested with a human genome the same behavior occurs.
Expected behaviour
Should succeed and actually process faster since it would ideally be simply skipping the alignment step.
Log files
Have you provided the following extra information/files:
- [X] The command used to run the pipeline
- [ ] The
.nextflow.logfile *Have not provided the log file, but can if requested.
System
- Hardware: Amazon EC2 m5.4xlarge (but also happens on smaller instances)
- OS: Ubuntu
- Version 20.04
Nextflow Installation
- Version: 21.04.0
Container engine
- Engine: Docker
- version: 20.10.8
Hi @JSchoenbachler ! Hope you are well and apologies for the delay. I have tried to reproduce the error you reported but haven't had much luck. Be great if you can try to reproduce using the following minimal test example please:
nextflow pull nf-core/rnaseq
nextflow run nf-core/rnaseq \
--input https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.4/samplesheet_test.csv \
--fasta https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fa \
--gtf https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf.gz \
--skip_alignment \
--pseudo_aligner salmon \
-r dev \
-profile singularity
Quite alot has changed on the dev branch of late as we have overhauled the default NF syntax in the pipeline. I am quietly hoping this has been fixed 🤞🏽
Hey @drpatelh , thanks for getting back to me!
I tried the command you provided above on the test samplesheet (only thing I did differently was to change profile to docker since that's how I'm running it) and it worked.
However, when I tried using -r dev on my command I originally provided above, I got the same error result. Did you try with my command and genome?
Thanks for giving it a go @JSchoenbachler. I tried to replicate the command you used with a test dataset but if you are still getting the error then I will try with the full-sized data. Will report back. Thanks!
@drpatelh okay thanks! I think everything I provided should be sufficient in debugging this, but if it isn't let me know!
Unable to replicate @JSchoenbachler 😏
I tried on a full-sized dataset on Nextflow Tower using the options below and it worked:
--genome GRCh37 --skip_alignment --pseudo_aligner salmon -r dev

I also tried to reproduce the command to run locally using the smaller test data with the command below and that worked too:
nextflow run main.nf \
--genome 'R64-1-1' \
--input https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.4/samplesheet_test.csv \
--skip_alignment \
--pseudo_aligner salmon \
-profile singularity
Be great if you are able to isolate the problem a little more or try and generate a smaller test we can use to reproduce. Will keep this open for now and bump to the next release milestone.
Thanks!
@drpatelh That's strange. If you don't mind could you follow my steps in the "Steps to reproduce" section, including the data I use as a source? I provided the contents of my ids.txt and samplesheet.csv, so you should be able to take that and run it.
Are you able to send me the .command.out and .command.err files in the work directory for the failing process? (and the .nextflow.log if you have it handy)? If it's sample related then there should be some sort of indication there.
Here are my files (since GitHub won't let me attach files with unsupported extensions): https://drive.google.com/drive/folders/1Z8Bu2FTPpu-dOS-Igmd3GKwYyElkZJyF?usp=sharing
Awesome. Thanks I have uploaded them here by adding a .txt extension. Sorry, didn't get time to do any more tests but I have just released v3.5. Will keep this open for the next release. Have a great Xmas and Happy New Year!
Unfortunately, I was unable to replicate this error with the ids.txt provided as input to nf-core/fetchngs and subsequently nf-core/rnaseq with the parameters provided in "Steps to reproduce".
Summary of what was tested:
- Run nf-core/fetchngs with the provided IDs as input
--input ids.txton Nextflow Tower with-r 1.10.0 - Run nf-core/rnaseq with the
samplesheet.csvgenerated from fetchngs and the following parameters to follow exactly what was provided in steps to reproduce:
--genome GRCm38 \
--skip_markduplicates \
--skip_bigwig \
--skip_stringtie \
--skip_preseq \
--skip_dupradar \
--skip_qualimap \
--skip_rseqc \
--skip_biotype_qc \
--skip_deseq2_qc \
--skip_alignment \
--pseudo_aligner salmon
- Pipeline completes successfully.
[ed/172522] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF_GENE_FILTER (genome.fa)
[79/905ada] Submitted process > NFCORE_RNASEQ:RNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet.csv)
[33/0392d6] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:CUSTOM_GETCHROMSIZES (genome.fa)
[fb/d05bb7] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/genome.fa)
[12/fcdcee] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (SRX7546617)
[bf/3a9904] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:FQ_SUBSAMPLE (SRX7546616)
[c2/6eb078] Submitted process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:SALMON_INDEX (genome.transcripts.fa)
[9e/33807b] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (SRX7546616)
[eb/dd92bb] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_QUANT (SRX7546617)
[99/4b5019] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE (SRX7546616)
[14/abc538] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRX7546616)
[ef/a7acbb] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC (SRX7546617)
[50/16fc98] Submitted process > NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE (SRX7546617)
[ab/b2123f] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_QUANT (SRX7546616)
[a8/e9ee57] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_QUANT (SRX7546617)
[c2/0649da] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TX2GENE (genes.gtf)
[65/0d2913] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_TXIMPORT
[24/9d70fb] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE (salmon_tx2gene.tsv)
[fb/8e55c4] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE_SCALED (salmon_tx2gene.tsv)
[a9/e31955] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_GENE_LENGTH_SCALED (salmon_tx2gene.tsv)
[12/549738] Submitted process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_SALMON:SALMON_SE_TRANSCRIPT (salmon_tx2gene.tsv)
[be/1964fc] Submitted process > NFCORE_RNASEQ:RNASEQ:CUSTOM_DUMPSOFTWAREVERSIONS (1)
[4d/2b1c4c] Submitted process > NFCORE_RNASEQ:RNASEQ:MULTIQC (1)
Waiting for file transfers to complete (1 files)
-[nf-core/rnaseq] Pipeline completed successfully -
Might have been resolved in earlier versions of the module or pipeline.
Thanks @ejseqera ! Given that this has been tested in the latest release I will close this issue. Please feel free to reopen if the issue persists.