foldseek
foldseek copied to clipboard
understanding the similarities of hits?
Expected Behavior
Hi, I'd like to make sure that my understanding of the foldseek easy-search
hits is correct.
By default ( --alignment-type 2
,without TM-align), the TMscore used for filtering is just the alntmscore (normalized by the alignment length), correct? So you've mentioned (https://github.com/steineggerlab/foldseek/issues/72) that the homology probability, prob
, can be used to determine the hits and make a final decision.
However, the results of foldseek easy-search
is sorted by similarity bitScore * sqrt(alnlddt * alntmscore)
, but not by bit score
or prob
, decreasingly by default. And this similarity value is not included in the --format-output
. Do you think this similarity index more appropriate and relevant to reflect the structural similarity than other two indices (bit score, prob)? How should I understand these three indices correctly?
Normally, if the TM-score between two structures exceeds 0.5, they are considered similar. What threshold of this similarity index (or bit score
and prob
) could be used to determine two structures are similar?
Thanks for your time and reply!
Foldseek utilizes a well-calibrated, BLAST-like E-value to evaluate the statistical significance of structural alignments. Lower E-values correspond to more significant matches and a decreased probability of random occurrences. E-values are affected by database size. Typically, hits with an E-value below 0.1 are considered homologous. For hits with E-values greater than 0.1, we introduced a probability (referred to as prob
) to aid in determining the likelihood of homology, which helps in better assessing the distance of the hit. The prob value is not dependent on database size. The appropriate E-value or prob threshold may vary based on your specific use-case.
Great! Thanks for your explanation!