smrnaseq
smrnaseq copied to clipboard
Valid `--mirtrace_species` is required even with `--mirgenedb` when `--mirgenedb_species` is given
Description of the bug
According to the documentation, it appears that the miRNA databases miRBase (used by default) and MirGeneDB are alternative sources of reference files, and either can be used. However, the parameter --mirtrace_species, which sets the species of the miRBase reference files, is required and there is no way to only use MirGeneDB.
Command used and terminal output
nextflow run smrnaseq -profile "$profile" \
--input "$samplefile" \
--protocol qiaseq \
--outdir "$pipeoutdir" \
--igenomes_ignore \
--genome null \
--fasta "$seqref" \
--mirgenedb \
--mirgenedb_species Hsa \
--mirgenedb_gff ... // other mirgene parameters
Error message:
Reference species for miRTrace is not defined via the --mirtrace_species parameter.
When using "--mirtrace_species null", the pipeline continues until MIRTRACE_RUN and crashes there, since "null" is not a valid genome name.
Relevant files
No response
System information
Nextflow version: 23.04.0
Hardware: HPC
Executor (slurm)
Container engine: Apptainer
OS: RedHat Linux 8
Version of nf-core/smrnaseq: 2.3.0
I believe this is a structural issue. The statements using MirGeneDB is in an if clause governed by the --mirgenedb parameter, but nothing seems to control the use of miRBase ...
It shouldnt break functionality per se, but agree this should be fixed.
is this still a problem in dev?
I ran with this options, and mirtop ran with mirgenedb and mirtrace worked fine.
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'
// Input data
input = 'samplesheet.csv'
mirgenedb_mature = 'hsa.fas'
mirgenedb_hairpin = 'https://mirgenedb.org/static/data/hsa/hsa-pre.fas'
mirgenedb_gff = 'hsa.gff'
mirgenedb = true
mirgenedb_species='Hsa'
mirtrace_species = 'hsa'
skip_mirdeep = true
protocol = 'illumina'
Happy to know more so I can help to fix this if I misunderstood.
just to add more context, mirtrace_species is required to run mirtrace. But mirtop will quantify with mirgenedb files if they are supplied.
I was able to reproduce the error.
nextflow run smrnaseq/ -profile illumina,docker --outdir mirdb --fasta 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa' --mirgenedb true --mirgenedb_species Hsa --input https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet.csv
Instead of exiting with the error MirGeneDB gff file not found, which would be expected because we are in MirGeneDB "mode" it exits with ERROR ~ Reference species for miRTrace is not defined via the --mirtrace_species parameter.
The fix is to make this check conditional, based on whether mirtrace_species is being used. In particular, the pipeline should only check for --mirtrace_species if --mirgenedb is not set.
I am working on this.
Adding this https://github.com/nf-core/smrnaseq/issues/131 as it has the same source of error.
Closed via #378