diamond icon indicating copy to clipboard operation
diamond copied to clipboard

Support for BLAST databases

Open bbuchfink opened this issue 4 years ago • 27 comments

Hey @tillea @mr-c pinging you since I'm about to release a new feature for Diamond to directly read BLAST databases. I'm doing this by linking against the shared libraries from NCBI, all of which are contained in the ncbi-blast+ debian package. However, the header files needed for compilation are not contained in any debian package.

My current procedure is to download the source tarball from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ and run configure and make install to get the headers. Needless to say, this is cumbersome, especially since you also need to go through the BLAST build process to get usable headers.

So, it would be great if these headers could be included in a debian package. Appreciate anything you can do.

bbuchfink avatar Mar 09 '21 09:03 bbuchfink

Hey @bbuchfink ; thanks for letting us know. Can you file a bug against the source package ncbi-blast+ ? https://bugs.debian.org/cgi-bin/pkgreport.cgi?archive=0;dist=unstable;ordering=normal;repeatmerged=0;src=ncbi-blast%2B

mr-c avatar Mar 09 '21 11:03 mr-c

done!

bbuchfink avatar Mar 09 '21 14:03 bbuchfink

Can you comment on performance when using BLAST databases instead of .dmnd?

sjaenick avatar Mar 10 '21 11:03 sjaenick

Loading in the database sequences may still take 10%-20% longer when using a BLAST db, but the overall impact on performance should be minimal.

bbuchfink avatar Mar 10 '21 12:03 bbuchfink

Hello I failed when trying to compile Diamond (my os is quite old, I can not use the binaries version 'GLIBC_2.17' not found...) So I managed to install 2.0.8 using Conda but: Error: This executable was not compiled with support for BLAST databases. Would it be possible to have a Conda version that is ready for the BLAST databases support? Thank you.

FredericBGA avatar Mar 17 '21 07:03 FredericBGA

Hello I failed when trying to compile Diamond (my os is quite old, I can not use the binaries version 'GLIBC_2.17' not found...) So I managed to install 2.0.8 using Conda but: Error: This executable was not compiled with support for BLAST databases. Would it be possible to have a Conda version that is ready for the BLAST databases support? Thank you.

It is planned but could still take some time. Can you tell me what error you are getting when compiling from source? It should be possible to fix this.

bbuchfink avatar Mar 17 '21 09:03 bbuchfink

Here is a gist file with my compilation issues: Diamond 2.0.8 compilation issues. (CentOS release 6.6 (Final))

I hope you will be able to help me. Otherwise I will wait for the conda version. Thank you.

FredericBGA avatar Mar 19 '21 09:03 FredericBGA

Here is a gist file with my compilation issues: Diamond 2.0.8 compilation issues. (CentOS release 6.6 (Final))

I hope you will be able to help me. Otherwise I will wait for the conda version. Thank you.

I don't really understand the cause of this error, there seems to be some problem with the compiler setup on your system. One thing you could try is to compile with a custom GCC as described here: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics#compiling-with-custom-gcc

bbuchfink avatar Mar 19 '21 09:03 bbuchfink

Thank you for your help. I've installed GCC 10.2.0 without having seen any errors. But the compilation still fails with the same type of errors. So either I've an issue with the OS (as version maybe now? as gcc has been upgraded) or either I miss something obvious. https://gist.github.com/FredericBGA/c696199937b6121959924ac040008d00 I will wait for the Conda version.

FredericBGA avatar Mar 22 '21 08:03 FredericBGA

I used a SEED fasta file and turned it into a blast database and while running it seems to look ok but after a while it encounters an error...

Loading query sequences...  [15.627s]
Masking queries...  [6.019s]
Building query seed set...  [0s]
Building query histograms...  [1.424s]
Allocating buffers...  [0s]
Loading reference sequences...  [0s]
Error: NCBI C++ Exception:
    T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbimpl.cpp", line 843: Error: (CSeqDBException::eArgErr) BLASTDB::ncbi::CSeqDBImpl::GetSeqIDs() - OID not found

Any clue what could be happening here? Is it the input fasta file that needs to conform to some format?

jjkoehorst avatar Mar 24 '21 07:03 jjkoehorst

Did you just run makeblastdb on a fasta file or something else? It may not work yet for aliased databases.

bbuchfink avatar Mar 24 '21 08:03 bbuchfink

The following command was used:

ncbi-blast-2.11.0+/bin/makeblastdb -dbtype prot -in seed_subsystems_db.fa -title SEED_subsystems

jjkoehorst avatar Mar 24 '21 09:03 jjkoehorst

It works for me when using a BLAST db created by makeblastdb. This error likely means the database has a sparse OID range for some reason, like alias databases do. I should be able to provide a fix for this shortly.

bbuchfink avatar Mar 24 '21 14:03 bbuchfink

hello @jjkoehorst

Could you tell me where can I download the seed_subsystems_db.fa since ftp.theseed.org is not accessible

Lix1993 avatar Apr 06 '21 07:04 Lix1993

I used a SEED fasta file and turned it into a blast database and while running it seems to look ok but after a while it encounters an error...

Loading query sequences...  [15.627s]
Masking queries...  [6.019s]
Building query seed set...  [0s]
Building query histograms...  [1.424s]
Allocating buffers...  [0s]
Loading reference sequences...  [0s]
Error: NCBI C++ Exception:
    T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbimpl.cpp", line 843: Error: (CSeqDBException::eArgErr) BLASTDB::ncbi::CSeqDBImpl::GetSeqIDs() - OID not found

Any clue what could be happening here? Is it the input fasta file that needs to conform to some format?

@jjkoehorst I think this should be fixed in the 2.0.9 release now.

bbuchfink avatar Apr 12 '21 18:04 bbuchfink

I can confirm that with the update it runs without an error. Thanks a lot for fixing this so fast!

@Lix1993 It is indeed true that ftp.theseed.org has been unavailable for quite some time unfortunately. I have contacted seed but no response.... A copy can found here: (although it is not very recent) https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/2c8s521xj9907hn/subsys_db.fa

(It comes from the samsa2 pipeline https://github.com/transcript/samsa2)

bartns avatar Apr 15 '21 10:04 bartns

thanks

bart. @.***> 于 2021年4月15日周四 下午6:40写道:

I can confirm that with the update it runs without an error. Thanks a lot for fixing this so fast!

@Lix1993 https://github.com/Lix1993 It is indeed true that ftp.theseed.org has been unavailable for quite some time unfortunately. I have contacted seed but no response.... A copy can found here: (although it is not very recent)

https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/2c8s521xj9907hn/subsys_db.fa

(It comes from the samsa2 pipeline https://github.com/transcript/samsa2)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/439#issuecomment-820323750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC77DX2TZ6DUXFAVAWLLVALTI27CBANCNFSM4Y3DX7ZA .

Lix1993 avatar Apr 15 '21 10:04 Lix1993

Hi, I'm having trouble with this feature. I tried the workflow in the Wiki:

downloading and using a BLAST database

update_blastdb.pl --decompress --blastdb_version 5 swissprot ./diamond blastp -d swissprot -q queries.fasta -o matches.tsv

I tried this with a bioconda install of version 2.0.9 as well as an installation straight from the Github source and in both instances got "Error: This executable was not compiled with support for BLAST databases."

Is the conclusion from the above that you have to make the blastdb yourself? Or have I missed something?

nwheeler443 avatar Apr 16 '21 07:04 nwheeler443

The conda version does not yet support this, you can download the prebuilt binary: http://github.com/bbuchfink/diamond/releases/download/v2.0.9/diamond-linux64.tar.gz

When compiling from source, some additional steps need to be taken to enable blast db support (see the installation page).

bbuchfink avatar Apr 16 '21 07:04 bbuchfink

Ahh great, that seems to be working! Thanks!

nwheeler443 avatar Apr 16 '21 07:04 nwheeler443

Hi, a quick reply to say that I managed to compile diamond. My issue was related to a lack of binutils tools.

FredericBGA avatar Jun 15 '21 12:06 FredericBGA

Since v2.0.10, using a BLAST database now requires a diamond prepdb call. But in return, performance for loading in the database has been largely improved, and using a BLAST database should now be substantially faster than using a .dmnd file.

bbuchfink avatar Jul 05 '21 09:07 bbuchfink

I got "No alias or index file found for protein database" when I use prepdb for pre-formatted nr blastp database. What's wrong with that?

JamesWZM avatar Sep 01 '21 16:09 JamesWZM

I'm also getting the no alias/index error when calling diamond prepdb -d nr on my newly downloaded nr db:

Error: NCBI C++ Exception: T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbalias.cpp", line 320: Error: (CSeqDBException::eFileErr) BLASTDB::ncbi::CSeqDBAliasNode::x_ResolveNames() - No alias or index file found for protein database [nr.fa] in search path

lmolokin avatar Sep 14 '21 20:09 lmolokin

I'm also getting the no alias/index error when calling diamond prepdb -d nr on my newly downloaded nr db:

Error: NCBI C++ Exception: T0 "/root/ncbi-blast-2.11.0+-src/c++/src/objtools/blast/seqdb_reader/seqdbalias.cpp", line 320: Error: (CSeqDBException::eFileErr) BLASTDB::ncbi::CSeqDBAliasNode::x_ResolveNames() - No alias or index file found for protein database [nr.fa] in search path

You are probably not specifying the correct path. Use the directory where you downloaded the files + /nr without extensions.

bbuchfink avatar Sep 20 '21 09:09 bbuchfink

You need to download a version that was compiled with BLAST db support, e.g. here: https://github.com/bbuchfink/diamond/releases

Am Do., 8. Feb. 2024 um 16:48 Uhr schrieb vdnadung @.***

:

Hello @bbuchfink https://github.com/bbuchfink,

I also have the error message Error: This executable was not compiled with support for BLAST databases. Here is what I did (using a HPC):

  • download nr.*.tar.gz files, save in nr folder
  • tar -xf .nr.*.tar.gz files
  • module load DIAMOND/2.1.8-GCC-12.2.0
  • diamond prepdb -d nr/nr
  • Error: This executable was not compiled with support for BLAST databases. Could you please help? Thank you in advance. Best wishes Dung

— Reply to this email directly, view it on GitHub https://github.com/bbuchfink/diamond/issues/439#issuecomment-1934411855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMJXOUTC5X5Q5BJWU2TTSTYSTXVXAVCNFSM4Y3DX7ZKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTGQ2DCMJYGU2Q . You are receiving this because you were mentioned.Message ID: @.***>

bbuchfink avatar Feb 13 '24 10:02 bbuchfink

Hi @bbuchfink,

I used DIAMOND/2.1.8 so I don't know why it happened.

Best wishes Dung

vdnadung avatar Feb 14 '24 13:02 vdnadung