smrnaseq
smrnaseq copied to clipboard
--mirna_gtf for organism with no miRBase GFF file
Description of the bug
I'm using sheep miRNA data. miRBase contains a few entries for sheep miRNAs but does not provide a gff file on it's download page. I instead used a gff of sheep miRNAs from the RumimiR database (https://rumimir.sigenae.org/), but reach an error at the mirtop_quant step.
I've uploaded the gff file used, but appended the file extension to .txt to allow for uploading.
My params file: input: '/gpfs01/home/sbzoh/F1_Seminal_Plasma_RNA/rawData/Fastq/F1_SeminalPlasma_Samplesheet.csv' outdir: '/gpfs01/home/sbzoh/F1_Seminal_Plasma_RNA/smrnaseq_output' with_umi: false mirtrace_species: 'oar' fasta: '/gpfs01/home/sbzoh//refGenome/Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.dna.toplevel.fasta' mirna_gtf: '/gpfs01/home/sbzoh//refGenome/rumimir_sheep.gff' mature: '/gpfs01/home/sbzoh//refGenome/mature.fa' hairpin: '/gpfs01/home/sbzoh//refGenome/hairpin.fa' filter_contamination: false skip_mirdeep: true
Command used and terminal output
## Command used
nextflow run nf-core/smrnaseq -profile singularity -params-file params.yaml
## Tail of output containing error
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/smrnaseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT'
Caused by:
Process `NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT` terminated with an error exit status (1)
Command executed:
#Cleanup the GTF if mirbase html form is broken
GTF="rumimir_sheep.gff"
sed 's/>/>/g' $GTF | sed 's#<br>#\n#g' | sed 's#</p>##g' | sed 's#<p>##g' | sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' > ${GTF}_html_cleaned.gtf
mirtop gff --hairpin hairpin.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps oar ./bams/*
mirtop counts --hairpin hairpin.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps oar --add-extra --gff mirtop/mirtop.gff
mirtop export --format isomir --hairpin hairpin.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf --sps oar -o mirtop mirtop/mirtop.gff
mirtop stats mirtop/mirtop.gff --out mirtop/stats
mv mirtop/stats/mirtop_stats.log mirtop/stats/full_mirtop_stats.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT":
mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
END_VERSIONS
Command exit status:
1
Command output:
['gff', '--hairpin', 'hairpin.fa_igenome.fa_idx.fa', '--gtf', 'rumimir_sheep.gff_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'oar', './bams/Sire_A_8324_Control_seqcluster.bam', './bams/Sire_A_8401_Control_seqcluster.bam', './bams/Sire_A_8631_Biosolids_seqcluster.bam', './bams/Sire_A_8698_Biosolids_seqcluster.bam', './bams/Sire_B_8335_Control_seqcluster.bam', './bams/Sire_B_8433_Control_seqcluster.bam', './bams/Sire_B_8607_Biosolids_seqcluster.bam', './bams/Sire_B_8796_Biosolids_seqcluster.bam', './bams/Sire_C_8235_Control_seqcluster.bam', './bams/Sire_C_8431_Control_seqcluster.bam', './bams/Sire_C_8747_Biosolids_seqcluster.bam', './bams/Sire_C_8767_Biosolids_seqcluster.bam', './bams/Sire_D_8231_Control_seqcluster.bam', './bams/Sire_D_8416_Control_seqcluster.bam', './bams/Sire_D_8744_Biosolids_seqcluster.bam', './bams/Sire_D_8758_Biosolids_seqcluster.bam']
Command error:
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
03/20/2024 06:34:35 INFO Run annotation
03/20/2024 06:34:35 ERROR Database not found in --mirna rumimir_sheep.gff_html_cleaned.gtf. Use --database argument to add a custom source.
['gff', '--hairpin', 'hairpin.fa_igenome.fa_idx.fa', '--gtf', 'rumimir_sheep.gff_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'oar', './bams/Sire_A_8324_Control_seqcluster.bam', './bams/Sire_A_8401_Control_seqcluster.bam', './bams/Sire_A_8631_Biosolids_seqcluster.bam', './bams/Sire_A_8698_Biosolids_seqcluster.bam', './bams/Sire_B_8335_Control_seqcluster.bam', './bams/Sire_B_8433_Control_seqcluster.bam', './bams/Sire_B_8607_Biosolids_seqcluster.bam', './bams/Sire_B_8796_Biosolids_seqcluster.bam', './bams/Sire_C_8235_Control_seqcluster.bam', './bams/Sire_C_8431_Control_seqcluster.bam', './bams/Sire_C_8747_Biosolids_seqcluster.bam', './bams/Sire_C_8767_Biosolids_seqcluster.bam', './bams/Sire_D_8231_Control_seqcluster.bam', './bams/Sire_D_8416_Control_seqcluster.bam', './bams/Sire_D_8744_Biosolids_seqcluster.bam', './bams/Sire_D_8758_Biosolids_seqcluster.bam']
Traceback (most recent call last):
File "/usr/local/bin/mirtop", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/mirtop/command_line.py", line 31, in main
reader(kwargs["args"])
File "/usr/local/lib/python3.9/site-packages/mirtop/gff/__init__.py", line 24, in reader
database = mapper.guess_database(args)
File "/usr/local/lib/python3.9/site-packages/mirtop/mirna/mapper.py", line 23, in guess_database
return _guess_database_file(args.gtf, args.database)
File "/usr/local/lib/python3.9/site-packages/mirtop/mirna/mapper.py", line 40, in _guess_database_file
raise ValueError("Database not found in %s header" % gff)
ValueError: Database not found in rumimir_sheep.gff_html_cleaned.gtf header
Work dir:
/gpfs01/home/sbzoh/F1_Seminal_Plasma_RNA/work/d8/42e9ee613e17eb83f5262cfae51a33
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
Relevant files
nextflow.log rumimir_sheep.txt
System information
Nextflow version (23.10.1) Hardware (HPC) Executor (slurm) Container engine: (Singularity) OS (CentOS Linux) Version of nf-core/smrnaseq (2.3.0)
Tried again on latest version (2.3.1) and getting the same error.
Hi @OliverH96,
for now you could try to pass the additional argument --database to mirtop using a custom config. This would require adding something like:
process {
withName: 'MIRTOP_QUANT' {
ext.args = "--database RumimiR"
}
}
You have to check if RumimiR is the term used in your provided gff. As far as I understand, mirtop searches for known tags in the gff file and therefore fails in your case.
Hi @OliverH96, for now you could try to pass the additional argument
--databaseto mirtop using a custom config. This would require adding something like:process { withName: 'MIRTOP_QUANT' { ext.args = "--database RumimiR" } }You have to check if
RumimiRis the term used in your provided gff. As far as I understand,mirtopsearches for known tags in the gff file and therefore fails in your case.
Apologies for getting back to you so late. This did seem to advance the pipeline slightly, but am now getting a different error:
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT'
Caused by:
Process `NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT` terminated with an error exit status (1)
Command executed:
#Cleanup the GTF if mirbase html form is broken
GTF="rumimir_sheep.gff"
sed 's/>/>/g' $GTF | sed 's#<br>#\n#g' | sed 's#</p>##g' | sed 's#<p>##g' | sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' > ${GTF}_html_cleaned.gtf
mirtop gff --hairpin hairpin.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps oar ./bams/*
mirtop counts --hairpin hairpin.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps oar --add-extra --gff mirtop/mirtop.gff
mirtop export --format isomir --hairpin hairpin.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf --sps oar -o mirtop mirtop/mirtop.gff
mirtop stats mirtop/mirtop.gff --out mirtop/stats
mv mirtop/stats/mirtop_stats.log mirtop/stats/full_mirtop_stats.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT":
mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
END_VERSIONS
Command exit status:
1
Command output:
['gff', '--hairpin', 'hairpin.fa_igenome.fa_idx.fa', '--gtf', 'rumimir_sheep.gff_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'oar', './bams/Sire_A_8324_Control_seqcluster.bam', './bams/Sire_A_8401_Control_seqcluster.bam', './bams/Sire_A_8631_Biosolids_seqcluster.bam', './bams/Sire_A_8698_Biosolids_seqcluster.bam', './bams/Sire_B_8335_Control_seqcluster.bam', './bams/Sire_B_8433_Control_seqcluster.bam', './bams/Sire_B_8607_Biosolids_seqcluster.bam', './bams/Sire_B_8796_Biosolids_seqcluster.bam', './bams/Sire_C_8235_Control_seqcluster.bam', './bams/Sire_C_8431_Control_seqcluster.bam', './bams/Sire_C_8747_Biosolids_seqcluster.bam', './bams/Sire_C_8767_Biosolids_seqcluster.bam', './bams/Sire_D_8231_Control_seqcluster.bam', './bams/Sire_D_8416_Control_seqcluster.bam', './bams/Sire_D_8744_Biosolids_seqcluster.bam', './bams/Sire_D_8758_Biosolids_seqcluster.bam']
Command error:
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
05/02/2024 05:12:45 INFO Run annotation
05/02/2024 05:12:45 INFO Database different than miRBase or MirGeneDB
05/02/2024 05:12:45 INFO If you get an error when loading,
05/02/2024 05:12:45 INFO report it to https://github.com/miRTop/mirtop/issues
['gff', '--hairpin', 'hairpin.fa_igenome.fa_idx.fa', '--gtf', 'rumimir_sheep.gff_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'oar', './bams/Sire_A_8324_Control_seqcluster.bam', './bams/Sire_A_8401_Control_seqcluster.bam', './bams/Sire_A_8631_Biosolids_seqcluster.bam', './bams/Sire_A_8698_Biosolids_seqcluster.bam', './bams/Sire_B_8335_Control_seqcluster.bam', './bams/Sire_B_8433_Control_seqcluster.bam', './bams/Sire_B_8607_Biosolids_seqcluster.bam', './bams/Sire_B_8796_Biosolids_seqcluster.bam', './bams/Sire_C_8235_Control_seqcluster.bam', './bams/Sire_C_8431_Control_seqcluster.bam', './bams/Sire_C_8747_Biosolids_seqcluster.bam', './bams/Sire_C_8767_Biosolids_seqcluster.bam', './bams/Sire_D_8231_Control_seqcluster.bam', './bams/Sire_D_8416_Control_seqcluster.bam', './bams/Sire_D_8744_Biosolids_seqcluster.bam', './bams/Sire_D_8758_Biosolids_seqcluster.bam']
Traceback (most recent call last):
File "/usr/local/bin/mirtop", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/mirtop/command_line.py", line 31, in main
reader(kwargs["args"])
File "/usr/local/lib/python3.9/site-packages/mirtop/gff/__init__.py", line 28, in reader
matures = mapper.read_gtf_to_precursor(args.gtf)
File "/usr/local/lib/python3.9/site-packages/mirtop/mirna/mapper.py", line 172, in read_gtf_to_precursor
mapped = read_gtf_to_precursor_mirbase(gtf)
File "/usr/local/lib/python3.9/site-packages/mirtop/mirna/mapper.py", line 333, in read_gtf_to_precursor_mirbase
id_dict[idname[0]] = name[0]
IndexError: list index out of range
Work dir:
/gpfs01/home/sbzoh/F1_Seminal_Plasma_RNA/work/bb/f388eaca99ec7268114f74a3fb2490
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
Hi @OliverH96, this one is a bit tough to debug without being able to run the pipeline with the exact setting that you used. Would it be possible for you to share a tar.gz file with the input files or a truncated version of the files here? The ones I need are input, fasta, mirna_gtf, mature, and hairpin
Please also use the latest dev version because this issue might have been solved when updating to latest mirtop.
I recently tried to debug this, I used this command:
nextflow run nf-core/smrnaseq -r dev -latest -profile docker --input https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet-full.csv --outdir results --with_umi false --mirtrace_species oar --fasta ../files/Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.dna.toplevel.fa.gz --mirna_gtf ../files/rumimir_sheep.gff --mature https://github.com/nf-core/test-datasets/raw/smrnaseq/miRBase/mature.fa --hairpin https://github.com/nf-core/test-datasets/raw/smrnaseq/miRBase/hairpin.fa --filter_contamination false --skip_mirdeep true -c ../files/rumimir.config -resume
fasta was downloaded from https://ftp.ensembl.org/pub/release-112/fasta/ovis_aries_rambouillet/dna/Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.dna.toplevel.fa.gz gff was downloaded from https://rumimir.sigenae.org/
rumimir.config config file contains the following:
process {
withName: 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_GFF' {
ext.args = "--database RumimiR"
}
}
But I encountered that I'm still getting the missing database issue, I opened a ticket for this in mirtop: https://github.com/miRTop/mirtop/issues/90