foldseek understanding the similarities of hits?

Expected Behavior

Hi, I'd like to make sure that my understanding of the foldseek easy-search hits is correct.

By default ( --alignment-type 2 ,without TM-align), the TMscore used for filtering is just the alntmscore (normalized by the alignment length), correct? So you've mentioned (https://github.com/steineggerlab/foldseek/issues/72) that the homology probability, prob , can be used to determine the hits and make a final decision. However, the results of foldseek easy-search is sorted by similarity bitScore * sqrt(alnlddt * alntmscore), but not by bit score or prob , decreasingly by default. And this similarity value is not included in the --format-output. Do you think this similarity index more appropriate and relevant to reflect the structural similarity than other two indices (bit score, prob)? How should I understand these three indices correctly? Normally, if the TM-score between two structures exceeds 0.5, they are considered similar. What threshold of this similarity index (or bit score and prob ) could be used to determine two structures are similar?

Thanks for your time and reply!

Apr 20 '23 04:04 BinhongLiu

Foldseek utilizes a well-calibrated, BLAST-like E-value to evaluate the statistical significance of structural alignments. Lower E-values correspond to more significant matches and a decreased probability of random occurrences. E-values are affected by database size. Typically, hits with an E-value below 0.1 are considered homologous. For hits with E-values greater than 0.1, we introduced a probability (referred to as prob) to aid in determining the likelihood of homology, which helps in better assessing the distance of the hit. The prob value is not dependent on database size. The appropriate E-value or prob threshold may vary based on your specific use-case.

Apr 26 '23 16:04 martin-steinegger

Great! Thanks for your explanation!

Apr 28 '23 12:04 BinhongLiu

foldseek foldseek copied to clipboard

understanding the similarities of hits?

Expected Behavior

foldseek
foldseek copied to clipboard