bowtie2 icon indicating copy to clipboard operation
bowtie2 copied to clipboard

bowtie2-build runs without error but without constructing .rev files

Open AnnaSOFI opened this issue 6 years ago • 68 comments

Hi,

I have just started learning bowtie2. I encountered a problem with database indexing with bowtie2-build command: indexing runs through but the output doesn't have .rev files, thus preventing me from doing alignments to the particular database. The db file to be indexed is a FASTA file, that is a collection of bacterial genomes. Previously I have successfully indexed and aligned to a db of the same kind (merged fasta files), so now I have no clue what is wrong with the setting or what is going on. I get the 4 other files in reasonable time, although multiple memory-usage test were performed by the software before passing.

I could not find any help online regarding this question. I wonder what I'm doing wrong?

BRs,

Anna

AnnaSOFI avatar Jun 26 '18 12:06 AnnaSOFI

Would it be possible to share the FASTA?

ch4rr0 avatar Jun 26 '18 13:06 ch4rr0

Hi,

The FASTA file is too big to post here, but it has been downloaded from here:

https://www.hmpdacc.org/hmp/catalog/grid.php?dataset=genomic&hmp_isolation_body_site=gastrointestinal_tract

Edit. I could also send the file if this is more convenient?

Anna

AnnaSOFI avatar Jun 27 '18 09:06 AnnaSOFI

Hi, I'm having a very similar problem trying to run bowtie2-build on two different FASTAs (one from metaphlan2's marker database and another that was built from chocophlan). Was there ever a solution found to this? I'm not actually getting an error message, but I think its stopping before it actually finishing the index (see below for the output). This is on version 2.3.4.1, I used bioconda to install.

INFO: Command: bowtie2-build-s --wrapper basic-0 -f Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index
Settings:
  Output files: "Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 6990730
Using parameters --bmax 5243048 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 5243048 --dcv 1024
Constructing suffix-array element generator

If a solution or workaround was found, I'd be very interested to know. Thanks for your help, Chris

whidbeyc avatar Jul 03 '18 04:07 whidbeyc

Hi,

I did not get official advice on this one, but now it is working. I split my enormous FASTA file into two (8.4GB --> 4.2GB) just yesterday and the indexing seemed now to be working for the first part at least (I did not yet run the alignment but at least the .rev files have appeared after indexing ins complete)! Could you try this approach also? ☺ I think I’m going to align my data against both of the FASTA files and then combine the alignment sam files.

BRs,

Anna Sorjamaa

Anna Sorjamaa, PhD student Research Group of Docent Justus Reunanen Biocenter Oulu / Cancer Research and Translational Medicine Research Unit University of Oulu Aapistie 5, P.O. box 5281 90014 University of Oulu, Finland Tel: +35845 879 0354 Email: [email protected]mailto:[email protected]

From: whidbeyc [email protected] Sent: Tuesday, July 3, 2018 7:49 AM To: BenLangmead/bowtie2 [email protected] Cc: Anna Sorjamaa [email protected]; Author [email protected] Subject: Re: [BenLangmead/bowtie2] bowtie2-build runs without error but without constructing .rev files (#194)

Hi, I'm having a very similar problem trying to run bowtie2-build on two different FASTAs (one from metaphlan2's marker database and another that was built from chocophlan). Was there ever a solution found to this? I'm not actually getting an error message, but I think its stopping before it actually finishing the index (see below for the output). This is on version 2.3.4.1, I used bioconda to install.

INFO: Command: bowtie2-build-s --wrapper basic-0 -f Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index

Settings:

Output files: "Humann2Test/A1HF_humann2_temp/A1HF_bowtie2_index.*.bt2"

Line rate: 6 (line is 64 bytes)

Lines per side: 1 (side is 64 bytes)

Offset rate: 4 (one in 16)

FTable chars: 10

Strings: unpacked

Max bucket size: default

Max bucket size, sqrt multiplier: default

Max bucket size, len divisor: 4

Difference-cover sample period: 1024

Endianness: little

Actual local endianness: little

Sanity checking: disabled

Assertions: disabled

Random seed: 0

Sizeofs: void*:8, int:4, long:8, size_t:8

Input files DNA, FASTA:

Humann2Test/A1HF_humann2_temp/A1HF_custom_chocophlan_database.ffn

Building a SMALL index

Reading reference sizes

Time reading reference sizes: 00:00:00

Calculating joined length

Writing header

Reserving space for joined string

Joining reference sequences

Time to join reference sequences: 00:00:00

bmax according to bmaxDivN setting: 6990730

Using parameters --bmax 5243048 --dcv 1024

Doing ahead-of-time memory usage test

Passed! Constructing with these parameters: --bmax 5243048 --dcv 1024

Constructing suffix-array element generator

If a solution or workaround was found, I'd be very interested to know. Thanks for your help, Chris

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/BenLangmead/bowtie2/issues/194#issuecomment-402012159, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AmuBYw-mctGb1StfhKlzjICmUM4hFcOqks5uCvetgaJpZM4U3y1H.

AnnaSOFI avatar Jul 03 '18 06:07 AnnaSOFI

Hi, I'm having exactly this problem as well - I installed bowtie2 automatically as part of humann2 (version 2.3.4.1), and bowtie2-build will not build .rev.bt2 files from any fasta file I give it, including as part of the MetaPhlAn pipeline indexing the database, but doesn't throw any errors either. It also produces a .2.bt2 file of 0kb, and (for the failed MetaPhlAn database index, where I have a comparison) the other files are smaller than the correctly-indexed database files provided by the makers of that software.

Really keen to figure out if this is a bowtie problem, a server problem, a data problem, or something else!

This is the output I get when I try:

Settings: Output files: "04_MAPPING/contigs.*.bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void*:8, int:4, long:8, size_t:8 Input files DNA, FASTA: 03_CONTIGS/contigs.fa Building a SMALL index Reading reference sizes Time reading reference sizes: 00:00:01 Calculating joined length Writing header Reserving space for joined string Joining reference sequences Time to join reference sequences: 00:00:01 bmax according to bmaxDivN setting: 17568529 Using parameters --bmax 13176397 --dcv 1024 Doing ahead-of-time memory usage test Passed! Constructing with these parameters: --bmax 13176397 --dcv 1024 Constructing suffix-array element generator

lcstewart avatar Jul 10 '18 01:07 lcstewart

I have already begun looking into this. I will update this thread as soon as I have more information.

ch4rr0 avatar Jul 10 '18 01:07 ch4rr0

Quick update: I installed versions 2.2.4 and 2.3.0 in conda environments and tried indexing the same file. Version 2.2.4 created all 6 output files without a problem; version 2.3.0 stopped at the same point as 2.3.4.1 and created the same 4 too-small files, but gave an error Segmentation fault (core dumped) when it did.

lcstewart avatar Jul 10 '18 02:07 lcstewart

@whidbeyc -- can you share your command line? How big is your index? @AnnaSOFI -- I am not able to access the files on the website you provided. Would it be possible to share the link some other way? Maybe a dropbox link?

ch4rr0 avatar Jul 10 '18 16:07 ch4rr0

Same problem here. I install bowtie2 2.3.4.1 with conda. Build the index of a fungi genome with no errors. The fungi genome is ~10^7 bp. The command I used: bowtie2-build -f fungi.fa fungi_bt2 bowtie2 --local -x fungi_bt2 -f -U contigs.fasta -S bowtie.contigs.fungi.sam -p 1

The error I got:

Could not open index file fungi_bt2.rev.1.bt2
Could not open index file fungi_bt2.rev.2.bt2
Segmentation fault (core dumped)
(ERR): bowtie2-align exited with value 139

bbsunchen avatar Jul 13 '18 19:07 bbsunchen

Same here! I'm trying to build an index from a fasta file that contains ~360,000 contigs (the file size is 194Mb). I'm using bowtie2 version 2.3.4.1 installed from conda (conda install --yes -c bioconda bowtie2=2.3.4.1) with the following command: bowtie2-build final.contigs.fa contig_index .

The program runs with no errors and finishes quickly but generates only four files (contig_index.1.bt2 [5.5Mb], contig_index.2.bt2 [0Mb], contig_index.3.bt2 [3.1Mb], and contig_index.4.bt2 [45Mb]), but no .rev1.bt2 or .rev2.bt2 files.

Here is the output I get when I try to build the indices:

Settings:
  Output files: "contig_index.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /home/ubuntu/output_169_subset/final.contigs.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:02
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:01
bmax according to bmaxDivN setting: 46852187
Using parameters --bmax 35139141 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 35139141 --dcv 1024
Constructing suffix-array element generator

Any advice? Thanks, Simon

uribe-convers avatar Jul 20 '18 00:07 uribe-convers

Quick update for people who need a quick fix, installing bowtie2 from the other conda link works (at least the bowtie2-build), although it's version 2.2.6.

Here is the link: conda install -c bioconda/label/broken bowtie2

uribe-convers avatar Jul 20 '18 00:07 uribe-convers

I probably should have done this before jumping into debugging code, but I just tried building an index with a conda--installed bowtie2 build and got the same output. This issue seems to be isolated to the conda build of bowtie2 as evidenced below:

Conda:

root@a50a2aa0a1ff:/bowtie2# bowtie2-build example/reference/lambda_virus.fa out
Settings:
  Output files: "out.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  example/reference/lambda_virus.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 12125
Using parameters --bmax 9094 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 9094 --dcv 1024
Constructing suffix-array element generator
root@a50a2aa0a1ff:/bowtie2# ls out*
out.1.bt2  out.2.bt2  out.3.bt2  out.4.bt2

Local build:

root@a50a2aa0a1ff:/bowtie2# ./bowtie2-build example/reference/lambda_virus.fa out
Settings:
  Output files: "out.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  example/reference/lambda_virus.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
bmax according to bmaxDivN setting: 12125
Using parameters --bmax 9094 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 9094 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 48502 (target: 9093)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 48502 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 48503 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 12334
fchr[G]: 23696
fchr[T]: 36516
fchr[$]: 48502
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4210730 bytes to primary EBWT file: out.1.bt2
Wrote 12132 bytes to secondary EBWT file: out.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 48502
    bwtLen: 48503
    sz: 12126
    bwtSz: 12126
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 3032
    offsSz: 12128
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 253
    numLines: 253
    ebwtTotLen: 16192
    ebwtTotSz: 16192
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 00:00:00
Reading reference sizes
  Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:00
  Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 12125
Using parameters --bmax 9094 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 9094 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:00
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:00
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:00
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 48502 (target: 9093)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block
  Sorting block of length 48502 for bucket 1
  (Using difference cover)
  Sorting block time: 00:00:00
Returning block of 48503 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 12334
fchr[G]: 23696
fchr[T]: 36516
fchr[$]: 48502
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4210730 bytes to primary EBWT file: out.rev.1.bt2
Wrote 12132 bytes to secondary EBWT file: out.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 48502
    bwtLen: 48503
    sz: 12126
    bwtSz: 12126
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 3032
    offsSz: 12128
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 253
    numLines: 253
    ebwtTotLen: 16192
    ebwtTotSz: 16192
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00
root@a50a2aa0a1ff:/bowtie2# ls out*
out.1.bt2  out.2.bt2  out.3.bt2  out.4.bt2  out.rev.1.bt2  out.rev.2.bt2  outq.cpp  outq.h

ch4rr0 avatar Jul 20 '18 01:07 ch4rr0

We do provide pre-built bowtie2 packages. Any objections to trying those?

ch4rr0 avatar Jul 20 '18 01:07 ch4rr0

I’m using bowtie on an instance on AWS and I’ve had problems with pre-built software before. Any chance of fixing the conda built or should I get the source code and try it to build it myself?

Thanks for looking into this!

— <Sent from a mobile device, please excuse any typo>

On Jul 19, 2018, at 18:48, ch4rr0 [email protected] wrote:

We do provide pre-built bowtie2 packages. Any objections to trying those?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

uribe-convers avatar Jul 20 '18 02:07 uribe-convers

I'll see what I can do with regards to the conda build tomorrow. I will keep you posted.

ch4rr0 avatar Jul 20 '18 02:07 ch4rr0

Yeah, my IT department won't grant permissions for installing software myself on our bioinformatics server except through conda or pip, so an updated/working conda install would be very handy.

lcstewart avatar Jul 20 '18 03:07 lcstewart

I put together a script that works within the conda framework to replace the problematic bowtie2 binaries with ones that work. Here's what it does:

  • Uses conda-build to build bowtie2 locally (with tbb and libz statically linked) then creates a local conda package
  • Deletes your existing bowtie2 runtime and uses the conda install command to replace it with the local build

You can find out more about this process here.

Once the package has been build, conda install will output the following

The following NEW packages will be INSTALLED:

    bowtie2: 2.3.4.1-py27h6bb024c_1 local

Proceed ([y]/n)? y

Make sure that the package is to be installed is marked local and not bioconda as seen above

Copy the below script verbatim and save to file e.g. bt2_conda.sh. From the command line execute the script by issuing the command: bash bt2_conda.sh. Let me know if you encounter any issues.

#!/bin/bash

if [[ ! -e `which conda-build` ]]; then
    conda install conda-build
fi

if [[ ! -e `which wget` && ! -e `which curl` ]]; then
    echo "Please make sure that either curl or wget is installed"
    exit 1
fi

function cleanup {
    files_to_remove=`ls /tmp/bowtie2`
    for f in $files_to_remove; do
        echo "Deleting $f"
        rm -f $f
    done
    rm -r /tmp/bowtie2
}

trap cleanup EXIT

mkdir /tmp/bowtie2 && cd /tmp/bowtie2
cat <<EOF > build.sh
#!/bin/bash
LDFLAGS=""
make static-libs && make RELEASE_BUILD=1

binaries="\
bowtie2 \
bowtie2-align-l \
bowtie2-align-s \
bowtie2-build \
bowtie2-build-l \
bowtie2-build-s \
bowtie2-inspect \
bowtie2-inspect-l \
bowtie2-inspect-s \
"
directories="scripts"
pythonfiles="bowtie2-build bowtie2-inspect"

PY3_BUILD="\${PY_VER%.*}"

if [ \$PY3_BUILD -eq 3 ]; then
    for i in \$pythonfiles; do
        2to3 --write \$i
    done
fi

for i in \$binaries; do
    cp \$i \$PREFIX/bin && chmod +x \$PREFIX/bin/\$i
done

for d in \$directories; do
    cp -r \$d \$PREFIX/bin
done
EOF

cat <<EOF > meta.yaml
{% set version = "2.3.4.1" %}


package:
  name: bowtie2
  version: {{ version }}

source:
  url: http://downloads.sourceforge.net/project/bowtie-bio/bowtie2/{{ version }}/bowtie2-{{ version }}-source.zip
  sha256: a1efef603b91ecc11cfdb822087ae00ecf2dd922e03c85eea1ed7f8230c119dc
  patches:
    - bowtie2.patch

build:
  number: 1

requirements:
  build:
    - {{ compiler('cxx') }}
  host:
    - python
  run:
    - python
    - perl

test:
  commands:
    - bowtie2 --help
    - bowtie2-align-l --help
    - bowtie2-align-s --help
    - bowtie2-build --help
    - bowtie2-build-l --help
    - bowtie2-build-s --help
    - bowtie2-inspect --help
    - bowtie2-inspect-l --help
    - bowtie2-inspect-s --help

about:
  home: 'http://bowtie-bio.sourceforge.net/bowtie2/index.shtml'
  license: GPLv3
  summary: Fast and sensitive read alignment

extra:
  identifiers:
    - biotools:bowtie2
    - doi:10.1038/nmeth.1923
EOF

cat <<EOF > bowtie2.patch
--- bowtie2.orig        2017-01-20 19:41:25.706765000 -0500
+++ bowtie2     2017-01-20 16:23:38.574188000 -0500
@@ -38,10 +38,10 @@
  my (\$vol,\$script_path,\$prog);
  \$prog = File::Spec->rel2abs( __FILE__ );

-while (-f \$prog && -l \$prog){
-    my (undef, \$dir, undef) = File::Spec->splitpath(\$prog);
-    \$prog = File::Spec->rel2abs(readlink(\$prog), \$dir);
-}
+#while (-f \$prog && -l \$prog){
+#    my (undef, \$dir, undef) = File::Spec->splitpath(\$prog);
+#    \$prog = File::Spec->rel2abs(readlink(\$prog), \$dir);
+#}

  (\$vol,\$script_path,\$prog)
                  = File::Spec->splitpath(\$prog);
EOF

cd -

conda-build /tmp/bowtie2

if [[ $? -ne 0 ]]; then
    echo "Build failed... exiting"
    exit 1
fi

echo "Build complete... Uninstalling your current bowtie2 runtime."
echo "Would you like to continue? yes/no?"
read ans
case "$ans" in
    yes)
        conda uninstall bowtie2
        ;;
    *)
        echo Exiting...
        exit 1
        ;;
esac
conda install --use-local bowtie2

ch4rr0 avatar Jul 20 '18 22:07 ch4rr0

Thank you so much! I’ll try it tomorrow morning and report back if I have any problems! Simon

uribe-convers avatar Jul 23 '18 01:07 uribe-convers

I encountered the same problem. The bowtie2-build is 2.3.4.1 from conda. I actually used a very small fasta file

a1 CATGTCCAGCTTCTCTTCAGTACCGCTCACCAGCCTAGGTGGGACCACTGACTGTGAGTC TGCAGTGGCCACCGCCCAGTCTCTGTGTCTCAAGCTCCAAGAGACGGTCACACACTAACC TGCAAGCCAAGGCTGGTGACTTTGACCATCCCTAACGCATGAGTTTTCCATGGAAACCTG GTCGGTGAACCTGACACGAAATTCCCAATTCCCCTTTACTCTGTACTGTGTGGCTGGTGC TCTTGTTTTCGTTCTCTCTCTCTCTCTCTCTCTCTCTCAAGTTGATTCCTCCATGTTGCT TTACAGAGACCTGCCAACTACCCAGGAATGTAAAAGCATTCATAGTATTTGTCTAGTAGA

No the two .rev.*.bt2 files Can anyone help with this issue?

microsat2018 avatar Jul 26 '18 21:07 microsat2018

I ended up being able to use the pre-built bowtie2 binary and works well. Get it here: https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.4.1

or from the command line

wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.4.1/bowtie2-2.3.4.1-linux-x86_64.zip/download -O bowtie2-2.3.4.1-linux-x86_64.zip

uribe-convers avatar Jul 26 '18 21:07 uribe-convers

My research group is having the same (or similar) problem when running bioconda::bowtie2=2.3.4.1 as part of bioconda::humann2=0.11.1. The error always something like:

Building a SMALL index
Index is corrupt: File size for /tmp/global/LLMGP_82144987094/MI-208-H/MI-208-H_humann2_temp/MI-208-H_bowtie2_index.1.bt2 should have been 132894396 but is actually 0.
Index is corrupt: File size for /tmp/global/LLMGP_82144987094/MI-208-H/MI-208-H_humann2_temp/MI-208-H_bowtie2_index.2.bt2 should have been 63736900 but is actually 0.
Please check if there is a problem with the disk or if disk is full.
Error: Encountered internal Bowtie 2 exception (#1)

...and sometimes in the "rev" index files are are zero-sized.

This error seems to happen stochastically for us, and usually only when the I/O load is really heavy. If I rerun the "problematic" samples in isolation (instead of 20-30 parallel humann2 jobs), then they usually work. Maybe it's some sort of file latency issue?

nick-youngblut avatar Aug 08 '18 09:08 nick-youngblut

I recently had a pull request merged that changes the way conda builds bowtie 2. This new build process will be in effect for our most recent release of bowtie 2, v2.3.4.2. The updated build process is identical to the way we build our bowtie 2 binaries for distribution. The resulting binaries will have all dependencies statically linked which should solve what I think has been the root of this issue, the dynamically linked TBB library. Please give this new version a try and let me know if this problem still persists.

ch4rr0 avatar Aug 08 '18 15:08 ch4rr0

I will be closing this thread. Feel free to reopen if you believe that v2.3.4.2 has not addressed this issue.

ch4rr0 avatar Aug 10 '18 13:08 ch4rr0

Hello,

I am re-opening this thread since I'm not able to get proper index files with either 2.3.3.1 or 2.3.4.1, but I get them with 2.2.9. I am working on a cluster (not through conda). If it matters, the input files contains nearly 1M sequences. The stdout is

[domeni@r102 new_contig_mappings]$ bowtie2-build thawponds_assembly.fa thawponds_assembly
Settings:
  Output files: "thawponds_assembly.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  thawponds_assembly.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:18
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:10
bmax according to bmaxDivN setting: 569603352
Using parameters --bmax 427202514 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 427202514 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:00:48
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:15
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:28
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 2.27841e+09 (target: 427202513)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
  No samples; assembling all-inclusive block

and I get all files but the rev ones. I've read the previous messages in this thread, it's not clear to me if I should ask the admins to re-install the previous 2.3.x versions or install 2.3.4.2.

Many thanks,

Domenico

domenico-simone avatar Sep 17 '18 16:09 domenico-simone

Hello,

How big is your input file? Are you using one of our pre-built binary packages?

ch4rr0 avatar Sep 17 '18 20:09 ch4rr0

Hi,

I still have the problem, that bowtie2-build does not create rev files and/or produces segmentation faults. Here are the versions I tested:

  • version 2.2.9 (conda create -n bowtie2-broken -c bioconda/label/broken bowtie2)
    • segmentation fault, no rev files
  • version 2.3.5 (conda create -n bowtie2 -c bioconda bowtie2)
    • segmentation fault, no rev files
  • version 2.2.9 (bowtie2-2.2.9-linux-x86_64.zip)
    • segmentation fault, no rev files
  • version 2.3.5 (bowtie2-2.3.5.1-linux-x86_64.zip)
    • no segmentation fault, no rev files

MarieLataretu avatar Dec 03 '19 12:12 MarieLataretu

Hello @MarieLataretu ,

Can you please share the FASTA and the command line for the index that you are trying to build? This issue seems to happen sporadically and hence has been difficult to debug.

ch4rr0 avatar Dec 03 '19 14:12 ch4rr0

Hi,

this is the input fasta (unzipped):

ftp://ftp.ensembl.org/pub/release-92/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

here we add 'chr' to chromosomes as prefix:

sed -r '/^>/ s/>([1-9MXY])/>chr\1/' Mus_musculus.GRCm38.dna.primary_assembly.fa > Mus_musculus.GRCm38.dna.primary_assembly.chr.fa

and this is the command:

mkdir bowtie2-index
nice [/path/to/]bowtie2-build -t 20 /path/to/genome/Mus_musculus.GRCm38.dna.primary_assembly.chr.fa bowtie2-index/Mus_musculus.GRCm38.dna.primary_assembly.chr &> bowtie2-index/bowtie2-build.log

I have the same problem without the prefix and when the fasta file is in the same directory as the output.

MarieLataretu avatar Dec 04 '19 08:12 MarieLataretu

If you have the bowtie2-build-s-debug binary available can you try using it when building the index? I'd also appreciate it if you can save and share the debug output with me.

ch4rr0 avatar Dec 04 '19 13:12 ch4rr0

Sure!

the command is now:

nice /path/to/bowtie2-build-s-debug -t 20 /path/to/Mus_musculus.GRCm38.dna.primary_assembly.chr.fa bowtie2-index/Mus_musculus.GRCm38.dna.primary_assembly.chr.debug &> bowtie2-index/debug.log

and the resulting log file:

Settings:
  Output files: "bowtie2-index/Mus_musculus.GRCm38.dna.primary_assembly.debug.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 20
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: enabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /data/fass1/genomes/Eukaryots/mus_musculus_done/03052018/Mus_musculus.GRCm38.dna.primary_assembly.fa
Building a SMALL index
Reading reference sizes
  Time reading reference sizes: 00:00:47
assert_leq: expected (20) <= (16)
bt2_idx.h:208
bowtie2-build-s-debug: bt2_idx.h:208: bool EbwtParams::repOk() const: Assertion `0' failed.

Only 3 and 4 index files were created (before only the rev files were missing); the running time dropped from > 10 minutes to ~ 1 minute.

Thanks for looking into this!

MarieLataretu avatar Dec 04 '19 14:12 MarieLataretu