hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

Both HHBlits and HHSearch give misaligned indels for homologous sequences

Open hrp1000 opened this issue 7 months ago • 0 comments

I put one chain from a PDB into my library, then run either HHBLits or HHSearch against another homologous chain with indels and the indels do not align between query and target.

Expected Behavior - indels should align

Current Behavior - indels do not align and sequence identity lower than it "obviously" would be if the indels aligned. NCBI Blast gives 97.37% sequence ID (the indels are in the right place), HHBlits says 88%.

Steps to Reproduce (for bugs)

Put sequence of chain C from 5vol into the library, run query of chain A from 5vol against it. Chain C has a leading PW at the N-terminus, and an indel from 184-190 of QGAVPAD. Chain A has a G at the C-terminus. Otherwise in all respects the two chains have 100% sequence identity.

command to run:

/bmm/soft/linux64/src/hh-suite-bin/bin/hhblits -n 1 -i /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhm -d /bmm/www/servers/phyre2/test/hmm/full -o /bmm/www/servers/phyre2/test/hmm/test_c7xrt//c5volA_.hhblits.hhr -b 100 -norealign -z 500 -alt 1 -aliw 60

HH-suite Output (for bugs)

see attached file, but the interesting bit is here - note the indel for c5volC_ (target) appears around residues 168-174, but in the query (c5volA_) appears around 196-202

Q ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHTTTTCSEEEEESCCSSCCCCTTSHHHHHHHHHHHT Q ss_pred ccchhheeecccchhHHHHHHHHhhcccccceeeeeccccCccCccccccccccccCCCC Q c5volA_ 121 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEDPNSKIAILTRSVIEN 180 (260) Q Consensus 121 ~~~~~~~~~~g~s~g~~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 180 (260) ..+..++.+.|.|.|+..+...+...+..+..++..++...................... T Consensus 123 ~~~~~~~~~~G~S~Gg~~a~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 (268) T c5volC_ 123 IGDRQHRAIAGLSMGGGGATNYGQRHSDMFCAVYAMSALMSIPEQGAVPADDPNSKIAIL 182 (268) T ss_dssp CCSGGGEEEEEETHHHHHHHHHHHHCTTTCSEEEEESCCSSCCSSC---CCCTTSHHHHH T ss_pred CCCCcccEEEEEccchHHHHHHHHhChHHhHHHhhccccccccccccccccccccCccch

Q ss_dssp CHHHHHHTCCHHHHH-------HHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE Q ss_pred chHHHHhhcchhhhh-------ccccccccccccccCccchHHHHHHHHHHHCCCcEEEE Q c5volA_ 181 SCVKYVMEADEDRKA-------DLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 233 (260) Q Consensus 181 ~~~~~~~~~~~~~~~-------~~~~~~~~~~~~~~~~~~~~~~~~~~~L~~~g~~~~~~ 233 (260) ............... ....+++++.+++.|....++++++++|++.|+++++. T Consensus 183 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~g~~D~~~~~~~~~~~~l~~~g~~~~~~ 242 (268) T c5volC_ 183 TRSVIENSCVKYVMEADEDRKADLRSVAWFVDCGDDDFLLDRNIEFYQAMRNAGVPCQFR 242 (268) T ss_dssp HHHHHHTCHHHHHHTCCHHHHHHHTTSEEEEECCTTCTTHHHHHHHHHHHHHTTCCCEEE T ss_pred hHHHHhcCHHHHHHhcChhhhhhccCceEEEEecCchHhHHHHHHHHHHHHHCCCCcEEE

Context

The context is that if a straightforward comparison between two homologous chains appears to give an erroneous alignment, how can I trust it for more complicated alignments with lower sequence identity?

Your Environment

  • Version/Git commit used: last publicly released version

  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz (happy to upload o/p of 'more /proc/cpuinfo' if that would help), 264GB physical RAM

  • Operating system and version: Red Hat Enterprise Linux Workstation release 6.6 (Santiago)

c5volA_.hhblits.txt

hrp1000 avatar Dec 07 '23 10:12 hrp1000