smrnaseq icon indicating copy to clipboard operation
smrnaseq copied to clipboard

Valid `--mirtrace_species` is required even with `--mirgenedb` when `--mirgenedb_species` is given

Open tdanhorn opened this issue 1 year ago • 4 comments

Description of the bug

According to the documentation, it appears that the miRNA databases miRBase (used by default) and MirGeneDB are alternative sources of reference files, and either can be used. However, the parameter --mirtrace_species, which sets the species of the miRBase reference files, is required and there is no way to only use MirGeneDB.

Command used and terminal output

nextflow run smrnaseq -profile "$profile" \
        --input "$samplefile" \
        --protocol qiaseq \
        --outdir "$pipeoutdir" \
        --igenomes_ignore \
        --genome null \
        --fasta "$seqref" \
        --mirgenedb \
        --mirgenedb_species Hsa \
        --mirgenedb_gff ... // other mirgene parameters

Error message:
Reference species for miRTrace is not defined via the --mirtrace_species parameter.

When using "--mirtrace_species null", the pipeline continues until MIRTRACE_RUN and crashes there, since "null" is not a valid genome name.

Relevant files

No response

System information

Nextflow version: 23.04.0
Hardware: HPC
Executor (slurm)
Container engine: Apptainer
OS: RedHat Linux 8
Version of nf-core/smrnaseq: 2.3.0

tdanhorn avatar May 01 '24 06:05 tdanhorn

I believe this is a structural issue. The statements using MirGeneDB is in an if clause governed by the --mirgenedb parameter, but nothing seems to control the use of miRBase ...

tdanhorn avatar May 01 '24 06:05 tdanhorn

It shouldnt break functionality per se, but agree this should be fixed.

apeltzer avatar May 04 '24 20:05 apeltzer

is this still a problem in dev?

I ran with this options, and mirtop ran with mirgenedb and mirtrace worked fine.

    config_profile_name        = 'Test profile'
    config_profile_description = 'Minimal test dataset to check pipeline function'

    // Limit resources so that this can run on GitHub Actions
    max_cpus   = 2
    max_memory = '6.GB'
    max_time   = '6.h'

    // Input data
    input            = 'samplesheet.csv'
    mirgenedb_mature           = 'hsa.fas'
    mirgenedb_hairpin          = 'https://mirgenedb.org/static/data/hsa/hsa-pre.fas'
    mirgenedb_gff        = 'hsa.gff'
    mirgenedb        = true
    mirgenedb_species='Hsa'
    mirtrace_species = 'hsa'
    skip_mirdeep     = true
    protocol         = 'illumina'

Happy to know more so I can help to fix this if I misunderstood.

lpantano avatar Jun 28 '24 20:06 lpantano

just to add more context, mirtrace_species is required to run mirtrace. But mirtop will quantify with mirgenedb files if they are supplied.

lpantano avatar Jun 28 '24 20:06 lpantano

I was able to reproduce the error.

nextflow run smrnaseq/ -profile illumina,docker --outdir mirdb --fasta 'https://github.com/nf-core/test-datasets/raw/smrnaseq/reference/genome.fa' --mirgenedb true --mirgenedb_species Hsa  --input https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet.csv

Instead of exiting with the error MirGeneDB gff file not found, which would be expected because we are in MirGeneDB "mode" it exits with ERROR ~ Reference species for miRTrace is not defined via the --mirtrace_species parameter.

The fix is to make this check conditional, based on whether mirtrace_species is being used. In particular, the pipeline should only check for --mirtrace_species if --mirgenedb is not set.

I am working on this.

atrigila avatar Aug 15 '24 21:08 atrigila

Adding this https://github.com/nf-core/smrnaseq/issues/131 as it has the same source of error.

atrigila avatar Aug 16 '24 13:08 atrigila

Closed via #378

atrigila avatar Aug 20 '24 18:08 atrigila