smrnaseq
smrnaseq copied to clipboard
INDEX_GENOME: Bowtie build error
Description of the bug
Error in the genome index step using both 2.3.0 or dev versions using the command below. All reference files are from mirbase and fasta from Ensembl. The test run worked properly. Any suggestions on what is causing the issue?
Command used and terminal output
$nextflow run nf-core/smrnaseq -r dev --input 'SampleSheet.csv' --outdir '/results' \
--mirtrace_species hsa --fasta 'Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz' \
--hairpin 'mature.fa' \
--mature 'hairpin.fa' \
--mirna_gtf 'hsa.gff3' \
--skip_mirdeep --protocol 'qiaseq' -profile singularity
Output:
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:INDEX_GENOME (Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz)'
Caused by:
Process `NFCORE_SMRNASEQ:INDEX_GENOME (Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz)` terminated with an error exit status (1)
Command executed:
# Remove any special base characters from reference genome FASTA file
sed '/^[^>]/s/[^ATGCatgc]/N/g' Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz > genome.edited.fa
sed -i 's/ .*//' genome.edited.fa
# Build bowtie index
bowtie-build genome.edited.fa genome --threads 6
cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:INDEX_GENOME":
bowtie: $(echo $(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*$//')
END_VERSIONS
Command exit status:
1
Command output:
Settings:
Output files: "genome.*.ebwt"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 5 (one in 32)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 24
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
genome.edited.fa
Reading reference sizes
Time reading reference sizes: 00:00:10
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Total time for call to driver() for forward index: 00:00:10
Command error:
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Warning: Encountered empty reference sequence
Warning: Encountered reference sequence with only gaps
Time reading reference sizes: 00:00:10
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Reference file does not seem to be a FASTA file
Time to join reference sequences: 00:00:00
Total time for call to driver() for forward index: 00:00:10
Command: bowtie-build --wrapper basic-0 --threads 6 genome.edited.fa genome
Relevant files
No response
System information
No response
Hi @AhmedMohamed1993, did you try with the extracted (not .gz) fasta file?
The extraction helped but stops at different point now.
ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT'
Caused by:
Process NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT terminated with an error exit status (1)
Command executed:
#Cleanup the GTF if mirbase html form is broken
GTF="hsa.gff3"
sed 's/>/>/g' $GTF | sed 's#
#\n#g' | sed 's#
##g' | sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' > ${GTF}_html_cleaned.gtf mirtop gff --hairpin mature.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps hsa ./bams/* mirtop counts --hairpin mature.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf -o mirtop --sps hsa --add-extra --gff mirtop/mirtop.gff mirtop export --format isomir --hairpin mature.fa_igenome.fa_idx.fa --gtf ${GTF}_html_cleaned.gtf --sps hsa -o mirtop mirtop/mirtop.gff mirtop stats mirtop/mirtop.gff --out mirtop/stats mv mirtop/stats/mirtop_stats.log mirtop/stats/full_mirtop_stats.log
cat <<-END_VERSIONS > versions.yml "NFCORE_SMRNASEQ:MIRNA_QUANT:MIRTOP_QUANT": mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //') END_VERSIONS
Command exit status: 1
Command output: ['gff', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', './bams/27_post_seqcluster.bam', './bams/28_post_seqcluster.bam', './bams/29_post_seqcluster.bam'] ['counts', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', '--add-extra', '--gff', 'mirtop/mirtop.gff'] ['export', '--format', 'isomir', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '--sps', 'hsa', '-o', 'mirtop', 'mirtop/mirtop.gff'] ['stats', 'mirtop/mirtop.gff', '--out', 'mirtop/stats']
Command error:
04/13/2024 04:04:15 INFO Filtered by being duplicated: 0
04/13/2024 04:04:15 INFO Filtered by being outside miRNA positions: 18784
04/13/2024 04:04:15 INFO Filtered by being low score: 0
04/13/2024 04:04:17 INFO It took 0.426 minutes
['gff', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', './bams/27_post_seqcluster.bam', './bams/28_post_seqcluster.bam', './bams/29_post_seqcluster.bam']
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
04/13/2024 04:04:20 INFO Run convert of GFF to TSV containing expression
04/13/2024 04:04:20 INFO INFO Reading GFF file mirtop/mirtop.gff
04/13/2024 04:04:20 INFO INFO Writing TSV file to directory mirtop
04/13/2024 04:04:20 INFO Missing Parents in hairpin file: 0
04/13/2024 04:04:20 INFO Missing MiRNAs in GFF file: 0
04/13/2024 04:04:20 INFO Non valid UID: 0
04/13/2024 04:04:20 INFO Output file is at mirtop/mirtop.tsv
04/13/2024 04:04:20 INFO It took 0.001 minutes
['counts', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '-o', 'mirtop', '--sps', 'hsa', '--add-extra', '--gff', 'mirtop/mirtop.gff']
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
04/13/2024 04:04:22 INFO Run export of GFF into other format.
04/13/2024 04:04:22 INFO INFO Writing TSV file to directory mirtop
04/13/2024 04:04:22 INFO INFO Reading GFF file mirtop/mirtop.gff
04/13/2024 04:04:22 INFO Missing Parents in hairpin file: 0
04/13/2024 04:04:22 INFO Missing MiRNAs in GFF file: 0
04/13/2024 04:04:22 INFO Non valid UID: 0
04/13/2024 04:04:22 INFO Output file is at mirtop/mirtop_rawData.tsv
04/13/2024 04:04:22 INFO It took 0.001 minutes
['export', '--format', 'isomir', '--hairpin', 'mature.fa_igenome.fa_idx.fa', '--gtf', 'hsa.gff3_html_cleaned.gtf', '--sps', 'hsa', '-o', 'mirtop', 'mirtop/mirtop.gff']
/usr/local/lib/python3.9/site-packages/mirtop/mirna/mintplates.py:512: SyntaxWarning: "is" with a literal. Did you mean "=="?
if prefix is '':
04/13/2024 04:04:24 INFO Run stats.
04/13/2024 04:04:24 INFO Reading: mirtop/mirtop.gff
['stats', 'mirtop/mirtop.gff', '--out', 'mirtop/stats']
Traceback (most recent call last):
File "/usr/local/bin/mirtop", line 10, in
Does it work if you do not specify --mirna_gtf hsa.gff3?
I am happy to help with this, sorry I am late, starting to work on this pipeline more now.
If you still have access to the working directory where this error happens, I am happy to look at the files and see what is going on.Thanks!
Please open a new issue if this still persists with dev. It should just work for -r dev if you pull the pipeline again. If thats not the case, let us know and open a new issue.