hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

Parsed hhr file has different name with a3m file

Open CryoSky opened this issue 4 years ago • 1 comments

:exclamation: Make to check out our User Guide.

Expected Behavior

hhr file should have same output to a3m file.

Current Behavior

When type this command, hhblits -i query.fasta -o query.hhr -oa3m query.a3m -Z 50 -B 50 -d database/pdb70 -Ofas test.fasta -hide_cons -hide_pred -hide_dssp I want to observe the hhr file has the same result to a3m. But I found the hhr file outputs the PDB id, No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 1C7K_A ZINC ENDOPROTEASE (E.C. 99.8 8.3E-26 6.1E-30 126.6 0.0 131 1-132 1-132 (132) 2 4HX3_C Extracellular small neu 99.8 1.9E-25 1.4E-29 125.3 0.0 131 1-132 3-134 (134) 3 6BTP_A Bone morphogenetic prot 99.7 5.5E-22 4.1E-26 118.0 0.0 93 2-99 16-110 (201) 4 3EDH_A Bone morphogenetic prot 99.7 6.8E-22 5.1E-26 117.6 0.0 93 2-99 15-109 (201) 5 5UE2_A Matrilysin (E.C.3.4.24. 99.7 1.1E-21 7.8E-26 120.3 0.0 128 2-132 89-241 (247)

while the corresponding a3m file

1C7K:A|PDBID|CHAIN|SEQUENCE TVTVTYDPSNAPSFQQEIANAAQIWNSSVRNVQLRAGGNADFSYYEGNDSRGSYAQTDGHGRGYIFLDYQQNQQYDSTRVTAHETGHVLGLPDHYQGPCSELMSGGGPGPSCTNPYPNAQERSRVNALWANG tr|I4X267|I4X267_9BACL Uncharacterized protein OS=Planococcus antarcticus DSM 14505 GN=A1A1_14264 PE=4 SV=1 --------------SSHTDYGLTNWNPVSSKVYISSTtsaSNAEIKVYAGDINKeGVYADALNYNinwlgqvtacwdcsysASRIRINTPVAKNYSkdRINaVMAHEAGHSLGINHSSVNTAndKALMLPNIL---SGNQVRIWDDNAALKSIYGP- tr|W4AXT5|W4AXT5_9BACL Uncharacterized protein OS=Paenibacillus sp. FSL R5-808 GN=C169_14064 PE=4 SV=1 --------------EGYFNTGKNNWNNISSKVGPLTYnqnsevnGKKSDRYYVGSTTNsGVLGYFNPMLnngtsvnpfesswdYGSIYAYKNQIdlyglTSAQITSsVATHEVGHSLSLSHNFGSACnnnCVMTANALT-----SIAPNTEDKTQLKNKWGN-

outputs different things. I want to know is the corresponding tr|I4X267|I4X267_9BACL equals to the PDB id in the same order? If not, I want to generate a multiple sequence alignment in fasta file for the top 20 in the hit list, how can I do?

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the issue in.

  • Version/Git commit used: 3.1.0
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version: CentOS 7

CryoSky avatar Nov 26 '19 02:11 CryoSky

It's an old issue, but maybe this answer will useful for someone.

It's probably related to output filtering - by default sequences with 90% pairwise identity are removed (see -id option), so you might end up removing the sequence listed in hits. If you want to get MSA with all sequences you must use -all option.

marta-sd avatar Mar 30 '21 08:03 marta-sd