hh-suite
hh-suite copied to clipboard
Both HHBlits and HHSearch give misaligned indels for homologous sequences
I put one chain from a PDB into my library, then run either HHBLits or HHSearch against another homologous chain with indels and the indels do not align between query and target.
Expected Behavior - indels should align
Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.
Steps to Reproduce (for bugs)
Put sequence of chain C from 5vol into the library, run query of chain A from 5vol against it. Chain C has a leading PW at the N-terminus, and an indel from 184-190 of QGAVPAD. Chain A has a G at the C-terminus. Otherwise in all respects the two chains have 100% sequence identity.
command to run:
/bmm/soft/linux64/src/hh-suite-bin/bin/hhblits -n 1 -i /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhm -d /bmm/www/servers/phyre2/test/hmm/full -o /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhr -b 100 -norealign -z 500 -alt 1 -aliw 60
HH-suite Output (for bugs)
see attached file, but the interesting bit is here - note the indel for c5volC_ (target) appears around residues 168-174, but in the query (c5volA_) appears around 196-202
Q ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHTTTTCSEEEEESCCSSCCCCTTSHHHHHHHHHHHT Q ss_pred ccchhheeecccchhHHHHHHHHhhcccccceeeeeccccCccCccccccccccccCCCC Q c5volA_ 121 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEDPNSKIAILTRSVIEN 180 (260) Q Consensus 121 ~~~~~~~~~~g~s~g~~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180 (260) ..+..++.+.|.|.|+..+...+...+..+..++..++...................... T Consensus 123 ~~~~~~~~~~G~S~Gg~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 (268) T c5volC_ 123 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEQGAVPADDPNSKIAIL 182 (268) T ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHCTTTCSEEEEESCCSSCCSSC---CCCTTSHHHHH T ss_pred CCCCcccEEEEEccchHHHHHHHHhChHHhHHHhhccccccccccccccccccccCccch
Q ss_dssp CHHHHHHTCCHHHHH-------HHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE Q ss_pred chHHHHhhcchhhhh-------ccccccccccccccCccchHHHHHHHHHHHCCCcEEEE Q c5volA_ 181 SCVKYVMEADEDRKA-------DLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 233 (260) Q Consensus 181 ~~~~~~~~~~~~~~~-------~~~~~~~~~~~~~~~~~~~~~~~~~~~L~~~g~~~~~~ 233 (260) ............... ....+++++.+++.|....++++++++|++.|+++++. T Consensus 183 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~g~~D~~~~~~~~~~~~l~~~g~~~~~~ 242 (268) T c5volC_ 183 TRSVIENSCVKYVMEADEDRKADLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 242 (268) T ss_dssp HHHHHHTCHHHHHHTCCHHHHHHHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE T ss_pred hHHHHhcCHHHHHHhcChhhhhhccCceEEEEecCchHhHHHHHHHHHHHHHCCCCcEEE
Context
The context is that if a straightforward comparison between two homologous chains appears to give an erroneous alignment, how can I trust it for more complicated alignments with lower sequence identity?
Your Environment
-
Version/Git commit used: last publicly released version
-
Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (happy to upload o/p of 'more /proc/cpuinfo' if that would help), 264GB physical RAM
-
Operating system and version: Red Hat Enterprise Linux Workstation release 6.6 (Santiago)