hh-suite
hh-suite copied to clipboard
HHsearch - inconsistent results
Hi, I'm trying to use hhsearch with a customized database. Both query and database alignments represent an entire family of homologs (there is no seed sequence). Hence, I reckon that -M 50 parameter should be applied to both query and database HMMs inferred from these alignments.
I tried to apply the -M 50 parameter and then, to test if it works, I changed the order of sequences in the alignments that were used to build the query HMMs, ran hhsearch, and it seemed to be OK (the results were basically the same when compared to the results inferred from the original alignments). But then, I changed the order of sequences in the alignments that were used to build the customized database and I got different results - it seems that the -M 50 parameter does not work...
Could you please check the bash script I've been using and help me to use the tool correctly? Thanks a lot!
#!/bin/sh
cd /home/users/mvladka/miniconda3/envs/hhsuite/msa
mamba activate hhsuite
ffindex_build -s ../mydatabase.ff{data,index} .
cd ..
ffindex_apply mydatabase.ff{data,index} -i mydatabase_a3m_wo_ss.ffindex -d mydatabase_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0
rm mydatabase.ff{data,index}
mv mydatabase_a3m_wo_ss.ffindex mydatabase_a3m.ffindex
mv mydatabase_a3m_wo_ss.ffdata mydatabase_a3m.ffdata
ffindex_apply mydatabase_a3m.ff{data,index} -i mydatabase_hhm.ffindex -d mydatabase_hhm.ffdata -- hhmake -i stdin -o stdout -v 0
cstranslate -f -x 0.3 -c 4 -I a3m -i mydatabase_a3m -o mydatabase_cs219
sort -k3 -n -r mydatabase_cs219.ffindex | cut -f1 > sorting.dat
ffindex_order sorting.dat mydatabase_hhm.ff{data,index} mydatabase_hhm_ordered.ff{data,index}
mv mydatabase_hhm_ordered.ffindex mydatabase_hhm.ffindex
mv mydatabase_hhm_ordered.ffdata mydatabase_hhm.ffdata
ffindex_order sorting.dat mydatabase_a3m.ff{data,index} mydatabase_a3m_ordered.ff{data,index}
mv mydatabase_a3m_ordered.ffindex mydatabase_a3m.ffindex
mv mydatabase_a3m_ordered.ffdata mydatabase_a3m.ffdata
for query in $(ls *.fasta); do perl /home/users/mvladka/miniconda3/envs/hhsuite/scripts/reformat.pl fas a3m $query $(basename $query .fasta).a3m -M 50 &> $(basename $query .fasta).reformatlog; done
for a3m_query in $(ls *.a3m); do hhmake -i $a3m_query -M a3m -add_cons &> $(basename a3m_query .a3m).hhmakelog; done
for hhm in $(ls *.hhm); do hhsearch -i $hhm -d mydatabase -M a3m &> $(basename $hhm .hhm).hhsearchlog; done
Could it be because the HHsuite databases may only use the first 100 sequences (or some other number) in the alignment?