hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

Running hhblits with the same inputs twice yields different a3m alignments

Open tanhevg opened this issue 3 years ago • 1 comments

Expected Behavior

I am expecting hhblits to produce the same alignments when executed on the same input multiple times

Current Behavior

When default settings are used, alignments are the same. But when the defaults are changed even in a non-meaningful way, like adding -e 0.001, results from two runs begin to diverge. The more non-standard options are added, the more the results diverge (alphafold in the archive). Reading the input sequence from file, as opposed to standard input in the steps below, might change the behaviour slightly, but the fundamental problem stays.

Steps to Reproduce (for bugs)

All the scripts and input files are in the archive attached . First, run hhblits with default parameters twice:

cat test.a3m | hhblits -i stdin -oa3m test1.a3m -o test1.hhr -cpu 8 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test1.out 2>test1.err
cat test.a3m | hhblits -i stdin -oa3m test2.a3m -o test2.hhr -cpu 8 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test2.out 2>test2.err

Observe that test1.a3m and test2.a3m contain the identical set of sequences (there is no diff):

grep '^>' test1.a3m | awk -F '|' '{print $2}' | sort -u > test1_a3m_list.txt
grep '^>' test2.a3m | awk -F '|' '{print $2}' | sort -u > test2_a3m_list.txt
diff -q test1_a3m_list.txt test2_a3m_list.txt

Now, modify the hhblits command line slightly:

 cat test.a3m | hhblits -i stdin -oa3m test1.a3m -o test1.hhr -cpu 8 -e 0.001 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test1.out 2>test1.err
 cat test.a3m | hhblits -i stdin -oa3m test2.a3m -o test2.hhr -cpu 8 -e 0.001 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test2.out 2>test2.err 

The same script for validating the results now shows a non-empty diff.

HH-suite Output (for bugs)

All standard output and error streams are included in the archive

Context

I was actually playing with alphafold, and was curious if the same features are used every time. Running hhblits twice on the same input with alphafold settings yields even more different results.

There was a similar issue filed a while ago, #198, but there has been no activity for some time, so I decided to raise another one.

Your Environment

  • hhblits v3.3.0, built from sources
  • 24 cores
  • AVX and SSE2 enabled, no AVX2
  • 64Gb RAM
  • CentOS 7.6, Linux Kernel 3.10

tanhevg avatar Nov 23 '21 10:11 tanhevg

how to reproduce the result for hhblits? it there any solution?

KK666-AI avatar Dec 25 '21 06:12 KK666-AI