hh-suite
hh-suite copied to clipboard
Running hhblits with the same inputs twice yields different a3m alignments
Expected Behavior
I am expecting hhblits to produce the same alignments when executed on the same input multiple times
Current Behavior
When default settings are used, alignments are the same. But when the defaults are changed even in a non-meaningful way, like adding -e 0.001
, results from two runs begin to diverge. The more non-standard options are added, the more the results diverge (alphafold
in the archive). Reading the input sequence from file, as opposed to standard input in the steps below, might change the behaviour slightly, but the fundamental problem stays.
Steps to Reproduce (for bugs)
All the scripts and input files are in the archive attached . First, run hhblits with default parameters twice:
cat test.a3m | hhblits -i stdin -oa3m test1.a3m -o test1.hhr -cpu 8 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test1.out 2>test1.err
cat test.a3m | hhblits -i stdin -oa3m test2.a3m -o test2.hhr -cpu 8 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test2.out 2>test2.err
Observe that test1.a3m
and test2.a3m
contain the identical set of sequences (there is no diff):
grep '^>' test1.a3m | awk -F '|' '{print $2}' | sort -u > test1_a3m_list.txt
grep '^>' test2.a3m | awk -F '|' '{print $2}' | sort -u > test2_a3m_list.txt
diff -q test1_a3m_list.txt test2_a3m_list.txt
Now, modify the hhblits command line slightly:
cat test.a3m | hhblits -i stdin -oa3m test1.a3m -o test1.hhr -cpu 8 -e 0.001 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test1.out 2>test1.err
cat test.a3m | hhblits -i stdin -oa3m test2.a3m -o test2.hhr -cpu 8 -e 0.001 -d /data/uniclust/uniclust30_2018_08/uniclust30_2018_08 > test2.out 2>test2.err
The same script for validating the results now shows a non-empty diff.
HH-suite Output (for bugs)
All standard output and error streams are included in the archive
Context
I was actually playing with alphafold, and was curious if the same features are used every time. Running hhblits twice on the same input with alphafold settings yields even more different results.
There was a similar issue filed a while ago, #198, but there has been no activity for some time, so I decided to raise another one.
Your Environment
- hhblits v3.3.0, built from sources
- 24 cores
- AVX and SSE2 enabled, no AVX2
- 64Gb RAM
- CentOS 7.6, Linux Kernel 3.10
how to reproduce the result for hhblits? it there any solution?