bowtie2
bowtie2 copied to clipboard
bowtie2-build runs without error but without constructing .rev files
Hi,
I have just started learning bowtie2. I encountered a problem with database indexing with bowtie2-build command: indexing runs through but the output doesn't have .rev files, thus preventing me from doing alignments to the particular database. The db file to be indexed is a FASTA file, that is a collection of bacterial genomes. Previously I have successfully indexed and aligned to a db of the same kind (merged fasta files), so now I have no clue what is wrong with the setting or what is going on. I get the 4 other files in reasonable time, although multiple memory-usage test were performed by the software before passing.
I could not find any help online regarding this question. I wonder what I'm doing wrong?
BRs,
Anna
Would it be possible to share the FASTA?
Hi,
The FASTA file is too big to post here, but it has been downloaded from here:
https://www.hmpdacc.org/hmp/catalog/grid.php?dataset=genomic&hmp_isolation_body_site=gastrointestinal_tract
Edit. I could also send the file if this is more convenient?
Anna
Hi, I'm having a very similar problem trying to run bowtie2-build on two different FASTAs (one from metaphlan2's marker database and another that was built from chocophlan). Was there ever a solution found to this? I'm not actually getting an error message, but I think its stopping before it actually finishing the index (see below for the output). This is on version 2.3.4.1, I used bioconda to install.
INFO: Command: bowtie2-build-s --wrapper basic-0 -f Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index
Settings:
Output files: "Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 6990730
Using parameters --bmax 5243048 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 5243048 --dcv 1024
Constructing suffix-array element generator
If a solution or workaround was found, I'd be very interested to know. Thanks for your help, Chris
Hi,
I did not get official advice on this one, but now it is working. I split my enormous FASTA file into two (8.4GB --> 4.2GB) just yesterday and the indexing seemed now to be working for the first part at least (I did not yet run the alignment but at least the .rev files have appeared after indexing ins complete)! Could you try this approach also? ☺ I think I’m going to align my data against both of the FASTA files and then combine the alignment sam files.
BRs,
Anna Sorjamaa
Anna Sorjamaa, PhD student Research Group of Docent Justus Reunanen Biocenter Oulu / Cancer Research and Translational Medicine Research Unit University of Oulu Aapistie 5, P.O. box 5281 90014 University of Oulu, Finland Tel: +35845 879 0354 Email: [email protected]mailto:[email protected]
From: whidbeyc [email protected] Sent: Tuesday, July 3, 2018 7:49 AM To: BenLangmead/bowtie2 [email protected] Cc: Anna Sorjamaa [email protected]; Author [email protected] Subject: Re: [BenLangmead/bowtie2] bowtie2-build runs without error but without constructing .rev files (#194)
Hi, I'm having a very similar problem trying to run bowtie2-build on two different FASTAs (one from metaphlan2's marker database and another that was built from chocophlan). Was there ever a solution found to this? I'm not actually getting an error message, but I think its stopping before it actually finishing the index (see below for the output). This is on version 2.3.4.1, I used bioconda to install.
INFO: Command: bowtie2-build-s --wrapper basic-0 -f Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index
Settings:
Output files: "Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 6990730
Using parameters --bmax 5243048 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 5243048 --dcv 1024
Constructing suffix-array element generator
If a solution or workaround was found, I'd be very interested to know. Thanks for your help, Chris
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/BenLangmead/bowtie2/issues/194#issuecomment-402012159, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AmuBYw-mctGb1StfhKlzjICmUM4hFcOqks5uCvetgaJpZM4U3y1H.
Hi, I'm having exactly this problem as well - I installed bowtie2 automatically as part of humann2 (version 2.3.4.1), and bowtie2-build will not build .rev.bt2 files from any fasta file I give it, including as part of the MetaPhlAn pipeline indexing the database, but doesn't throw any errors either. It also produces a .2.bt2 file of 0kb, and (for the failed MetaPhlAn database index, where I have a comparison) the other files are smaller than the correctly-indexed database files provided by the makers of that software.
Really keen to figure out if this is a bowtie problem, a server problem, a data problem, or something else!
This is the output I get when I try:
Settings: Output files: "04_MAPPING/contigs.*.bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void*:8, int:4, long:8, size_t:8 Input files DNA, FASTA: 03_CONTIGS/contigs.fa Building a SMALL index Reading reference sizes Time reading reference sizes: 00:00:01 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:01 bmax according to bmaxDivN setting: 17568529 Using parameters --bmax 13176397 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 13176397 --dcv 1024 Constructing suffix-array element generator
I have already begun looking into this. I will update this thread as soon as I have more information.
Quick update: I installed versions 2.2.4 and 2.3.0 in conda environments and tried indexing the same file. Version 2.2.4 created all 6 output files without a problem; version 2.3.0 stopped at the same point as 2.3.4.1 and created the same 4 too-small files, but gave an error Segmentation fault (core dumped)
when it did.
@whidbeyc -- can you share your command line? How big is your index? @AnnaSOFI -- I am not able to access the files on the website you provided. Would it be possible to share the link some other way? Maybe a dropbox link?
Same problem here.
I install bowtie2 2.3.4.1 with conda.
Build the index of a fungi genome with no errors.
The fungi genome is ~10^7 bp.
The command I used:
bowtie2-build -f fungi.fa fungi_bt2
bowtie2 --local -x fungi_bt2 -f -U contigs.fasta -S bowtie.contigs.fungi.sam -p 1
The error I got:
Could not open index file fungi_bt2.rev.1.bt2
Could not open index file fungi_bt2.rev.2.bt2
Segmentation fault (core dumped)
(ERR): bowtie2-align exited with value 139
Same here!
I'm trying to build an index from a fasta file that contains ~360,000 contigs (the file size is 194Mb). I'm using bowtie2 version 2.3.4.1 installed from conda (conda install --yes -c bioconda bowtie2=2.3.4.1
) with the following command: bowtie2-build final.contigs.fa contig_index
.
The program runs with no errors and finishes quickly but generates only four files (contig_index.1.bt2 [5.5Mb], contig_index.2.bt2 [0Mb], contig_index.3.bt2 [3.1Mb], and contig_index.4.bt2 [45Mb]), but no .rev1.bt2 or .rev2.bt2 files.
Here is the output I get when I try to build the indices:
Settings:
Output files: "contig_index.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/home/ubuntu/output_169_subset/final.contigs.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:02
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:01
bmax according to bmaxDivN setting: 46852187
Using parameters --bmax 35139141 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 35139141 --dcv 1024
Constructing suffix-array element generator
Any advice? Thanks, Simon
Quick update for people who need a quick fix, installing bowtie2 from the other conda link works (at least the bowtie2-build
), although it's version 2.2.6.
Here is the link: conda install -c bioconda/label/broken bowtie2
I probably should have done this before jumping into debugging code, but I just tried building an index with a conda
--installed bowtie2
build and got the same output. This issue seems to be isolated to the conda
build of bowtie2
as evidenced below:
Conda:
root@a50a2aa0a1ff:/bowtie2# bowtie2-build example/reference/lambda_virus.fa out
Settings:
Output files: "out.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
example/reference/lambda_virus.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 12125
Using parameters --bmax 9094 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 9094 --dcv 1024
Constructing suffix-array element generator
root@a50a2aa0a1ff:/bowtie2# ls out*
out.1.bt2 out.2.bt2 out.3.bt2 out.4.bt2
Local build:
root@a50a2aa0a1ff:/bowtie2# ./bowtie2-build example/reference/lambda_virus.fa out
Settings:
Output files: "out.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
example/reference/lambda_virus.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 12125
Using parameters --bmax 9094 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 9094 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 48502 (target: 9093)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 48502 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 48503 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 12334
fchr[G]: 23696
fchr[T]: 36516
fchr[$]: 48502
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4210730 bytes to primary EBWT file: out.1.bt2
Wrote 12132 bytes to secondary EBWT file: out.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 48502
bwtLen: 48503
sz: 12126
bwtSz: 12126
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 3032
offsSz: 12128
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 253
numLines: 253
ebwtTotLen: 16192
ebwtTotSz: 16192
color: 0
reverse: 0
Total time for call to driver() for forward index: 00:00:00
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 12125
Using parameters --bmax 9094 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 9094 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 48502 (target: 9093)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 48502 for bucket 1
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 48503 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 12334
fchr[G]: 23696
fchr[T]: 36516
fchr[$]: 48502
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4210730 bytes to primary EBWT file: out.rev.1.bt2
Wrote 12132 bytes to secondary EBWT file: out.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 48502
bwtLen: 48503
sz: 12126
bwtSz: 12126
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 3032
offsSz: 12128
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 253
numLines: 253
ebwtTotLen: 16192
ebwtTotSz: 16192
color: 0
reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00
root@a50a2aa0a1ff:/bowtie2# ls out*
out.1.bt2 out.2.bt2 out.3.bt2 out.4.bt2 out.rev.1.bt2 out.rev.2.bt2 outq.cpp outq.h
We do provide pre-built bowtie2
packages. Any objections to trying those?
I’m using bowtie on an instance on AWS and I’ve had problems with pre-built software before. Any chance of fixing the conda built or should I get the source code and try it to build it myself?
Thanks for looking into this!
— <Sent from a mobile device, please excuse any typo>
On Jul 19, 2018, at 18:48, ch4rr0 [email protected] wrote:
We do provide pre-built bowtie2 packages. Any objections to trying those?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I'll see what I can do with regards to the conda build tomorrow. I will keep you posted.
Yeah, my IT department won't grant permissions for installing software myself on our bioinformatics server except through conda or pip, so an updated/working conda install would be very handy.
I put together a script that works within the conda framework to replace the problematic bowtie2
binaries with ones that work. Here's what it does:
- Uses
conda-build
to buildbowtie2
locally (withtbb
andlibz
statically linked) then creates a localconda
package - Deletes your existing
bowtie2
runtime and uses theconda install
command to replace it with the local build
You can find out more about this process here.
Once the package has been build, conda install
will output the following
The following NEW packages will be INSTALLED:
bowtie2: 2.3.4.1-py27h6bb024c_1 local
Proceed ([y]/n)? y
Make sure that the package is to be installed is marked local
and not bioconda
as seen above
Copy the below script verbatim and save to file e.g. bt2_conda.sh
. From the command line execute the script by issuing the command: bash bt2_conda.sh
. Let me know if you encounter any issues.
#!/bin/bash
if [[ ! -e `which conda-build` ]]; then
conda install conda-build
fi
if [[ ! -e `which wget` && ! -e `which curl` ]]; then
echo "Please make sure that either curl or wget is installed"
exit 1
fi
function cleanup {
files_to_remove=`ls /tmp/bowtie2`
for f in $files_to_remove; do
echo "Deleting $f"
rm -f $f
done
rm -r /tmp/bowtie2
}
trap cleanup EXIT
mkdir /tmp/bowtie2 && cd /tmp/bowtie2
cat <<EOF > build.sh
#!/bin/bash
LDFLAGS=""
make static-libs && make RELEASE_BUILD=1
binaries="\
bowtie2 \
bowtie2-align-l \
bowtie2-align-s \
bowtie2-build \
bowtie2-build-l \
bowtie2-build-s \
bowtie2-inspect \
bowtie2-inspect-l \
bowtie2-inspect-s \
"
directories="scripts"
pythonfiles="bowtie2-build bowtie2-inspect"
PY3_BUILD="\${PY_VER%.*}"
if [ \$PY3_BUILD -eq 3 ]; then
for i in \$pythonfiles; do
2to3 --write \$i
done
fi
for i in \$binaries; do
cp \$i \$PREFIX/bin && chmod +x \$PREFIX/bin/\$i
done
for d in \$directories; do
cp -r \$d \$PREFIX/bin
done
EOF
cat <<EOF > meta.yaml
{% set version = "2.3.4.1" %}
package:
name: bowtie2
version: {{ version }}
source:
url: http://downloads.sourceforge.net/project/bowtie-bio/bowtie2/{{ version }}/bowtie2-{{ version }}-source.zip
sha256: a1efef603b91ecc11cfdb822087ae00ecf2dd922e03c85eea1ed7f8230c119dc
patches:
- bowtie2.patch
build:
number: 1
requirements:
build:
- {{ compiler('cxx') }}
host:
- python
run:
- python
- perl
test:
commands:
- bowtie2 --help
- bowtie2-align-l --help
- bowtie2-align-s --help
- bowtie2-build --help
- bowtie2-build-l --help
- bowtie2-build-s --help
- bowtie2-inspect --help
- bowtie2-inspect-l --help
- bowtie2-inspect-s --help
about:
home: 'http://bowtie-bio.sourceforge.net/bowtie2/index.shtml'
license: GPLv3
summary: Fast and sensitive read alignment
extra:
identifiers:
- biotools:bowtie2
- doi:10.1038/nmeth.1923
EOF
cat <<EOF > bowtie2.patch
--- bowtie2.orig 2017-01-20 19:41:25.706765000 -0500
+++ bowtie2 2017-01-20 16:23:38.574188000 -0500
@@ -38,10 +38,10 @@
my (\$vol,\$script_path,\$prog);
\$prog = File::Spec->rel2abs( __FILE__ );
-while (-f \$prog && -l \$prog){
- my (undef, \$dir, undef) = File::Spec->splitpath(\$prog);
- \$prog = File::Spec->rel2abs(readlink(\$prog), \$dir);
-}
+#while (-f \$prog && -l \$prog){
+# my (undef, \$dir, undef) = File::Spec->splitpath(\$prog);
+# \$prog = File::Spec->rel2abs(readlink(\$prog), \$dir);
+#}
(\$vol,\$script_path,\$prog)
= File::Spec->splitpath(\$prog);
EOF
cd -
conda-build /tmp/bowtie2
if [[ $? -ne 0 ]]; then
echo "Build failed... exiting"
exit 1
fi
echo "Build complete... Uninstalling your current bowtie2 runtime."
echo "Would you like to continue? yes/no?"
read ans
case "$ans" in
yes)
conda uninstall bowtie2
;;
*)
echo Exiting...
exit 1
;;
esac
conda install --use-local bowtie2
Thank you so much! I’ll try it tomorrow morning and report back if I have any problems! Simon
I encountered the same problem. The bowtie2-build is 2.3.4.1 from conda. I actually used a very small fasta file
a1 CATGTCCAGCTTCTCTTCAGTACCGCTCACCAGCCTAGGTGGGACCACTGACTGTGAGTC TGCAGTGGCCACCGCCCAGTCTCTGTGTCTCAAGCTCCAAGAGACGGTCACACACTAACC TGCAAGCCAAGGCTGGTGACTTTGACCATCCCTAACGCATGAGTTTTCCATGGAAACCTG GTCGGTGAACCTGACACGAAATTCCCAATTCCCCTTTACTCTGTACTGTGTGGCTGGTGC TCTTGTTTTCGTTCTCTCTCTCTCTCTCTCTCTCTCTCAAGTTGATTCCTCCATGTTGCT TTACAGAGACCTGCCAACTACCCAGGAATGTAAAAGCATTCATAGTATTTGTCTAGTAGA
No the two .rev.*.bt2 files Can anyone help with this issue?
I ended up being able to use the pre-built bowtie2
binary and works well. Get it here: https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.4.1
or from the command line
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.4.1/bowtie2-2.3.4.1-linux-x86_64.zip/download -O bowtie2-2.3.4.1-linux-x86_64.zip
My research group is having the same (or similar) problem when running bioconda::bowtie2=2.3.4.1
as part of bioconda::humann2=0.11.1
. The error always something like:
Building a SMALL index
Index is corrupt: File size for /tmp/global/LLMGP_82144987094/MI-208-H/MI-208-H_humann2_temp/MI-208-H_bowtie2_index.1.bt2 should have been 132894396 but is actually 0.
Index is corrupt: File size for /tmp/global/LLMGP_82144987094/MI-208-H/MI-208-H_humann2_temp/MI-208-H_bowtie2_index.2.bt2 should have been 63736900 but is actually 0.
Please check if there is a problem with the disk or if disk is full.
Error: Encountered internal Bowtie 2 exception (#1)
...and sometimes in the "rev" index files are are zero-sized.
This error seems to happen stochastically for us, and usually only when the I/O load is really heavy. If I rerun the "problematic" samples in isolation (instead of 20-30 parallel humann2 jobs), then they usually work. Maybe it's some sort of file latency issue?
I recently had a pull request merged that changes the way conda builds bowtie 2. This new build process will be in effect for our most recent release of bowtie 2, v2.3.4.2. The updated build process is identical to the way we build our bowtie 2 binaries for distribution. The resulting binaries will have all dependencies statically linked which should solve what I think has been the root of this issue, the dynamically linked TBB library. Please give this new version a try and let me know if this problem still persists.
I will be closing this thread. Feel free to reopen if you believe that v2.3.4.2 has not addressed this issue.
Hello,
I am re-opening this thread since I'm not able to get proper index files with either 2.3.3.1 or 2.3.4.1, but I get them with 2.2.9. I am working on a cluster (not through conda). If it matters, the input files contains nearly 1M sequences. The stdout is
[domeni@r102 new_contig_mappings]$ bowtie2-build thawponds_assembly.fa thawponds_assembly
Settings:
Output files: "thawponds_assembly.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
thawponds_assembly.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:18
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:10
bmax according to bmaxDivN setting: 569603352
Using parameters --bmax 427202514 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 427202514 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:48
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:15
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:28
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 2.27841e+09 (target: 427202513)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
and I get all files but the rev ones. I've read the previous messages in this thread, it's not clear to me if I should ask the admins to re-install the previous 2.3.x versions or install 2.3.4.2.
Many thanks,
Domenico
Hello,
How big is your input file? Are you using one of our pre-built binary packages?
Hi,
I still have the problem, that bowtie2-build does not create rev files and/or produces segmentation faults. Here are the versions I tested:
- version 2.2.9 (conda create -n bowtie2-broken -c bioconda/label/broken bowtie2)
- segmentation fault, no rev files
- version 2.3.5 (conda create -n bowtie2 -c bioconda bowtie2)
- segmentation fault, no rev files
- version 2.2.9 (bowtie2-2.2.9-linux-x86_64.zip)
- segmentation fault, no rev files
- version 2.3.5 (bowtie2-2.3.5.1-linux-x86_64.zip)
- no segmentation fault, no rev files
Hello @MarieLataretu ,
Can you please share the FASTA and the command line for the index that you are trying to build? This issue seems to happen sporadically and hence has been difficult to debug.
Hi,
this is the input fasta (unzipped):
ftp://ftp.ensembl.org/pub/release-92/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
here we add 'chr' to chromosomes as prefix:
sed -r '/^>/ s/>([1-9MXY])/>chr\1/' Mus_musculus.GRCm38.dna.primary_assembly.fa > Mus_musculus.GRCm38.dna.primary_assembly.chr.fa
and this is the command:
mkdir bowtie2-index
nice [/path/to/]bowtie2-build -t 20 /path/to/genome/Mus_musculus.GRCm38.dna.primary_assembly.chr.fa bowtie2-index/Mus_musculus.GRCm38.dna.primary_assembly.chr &> bowtie2-index/bowtie2-build.log
I have the same problem without the prefix and when the fasta file is in the same directory as the output.
If you have the bowtie2-build-s-debug
binary available can you try using it when building the index? I'd also appreciate it if you can save and share the debug output with me.
Sure!
the command is now:
nice /path/to/bowtie2-build-s-debug -t 20 /path/to/Mus_musculus.GRCm38.dna.primary_assembly.chr.fa bowtie2-index/Mus_musculus.GRCm38.dna.primary_assembly.chr.debug &> bowtie2-index/debug.log
and the resulting log file:
Settings:
Output files: "bowtie2-index/Mus_musculus.GRCm38.dna.primary_assembly.debug.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 20
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: enabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
/data/fass1/genomes/Eukaryots/mus_musculus_done/03052018/Mus_musculus.GRCm38.dna.primary_assembly.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:47
assert_leq: expected (20) <= (16)
bt2_idx.h:208
bowtie2-build-s-debug: bt2_idx.h:208: bool EbwtParams::repOk() const: Assertion `0' failed.
Only 3 and 4 index files were created (before only the rev files were missing); the running time dropped from > 10 minutes to ~ 1 minute.
Thanks for looking into this!