hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

hhblits : some values (No_of_seqs & Neff) are wrong with some combination of parameters

Open smg3d opened this issue 4 years ago • 0 comments

Expected Behavior

Values for "No_of_seqs" & "Neff" in stdout and in result file (.hhr) should be identical. Values for "No_of_seqs" & "Neff" should be correct.

Current Behavior

"No_of_seqs" & "Neff" in result file (.hhr) : the values are incorrect with the options tested. "No_of_seqs" & "Neff" in stdout : the value are correct with some options, and incorrect with other options. When wrong, it looks like the "No_of_seqs" & "Neff" correspond to the n-1 iteration.

Steps to Reproduce (for bugs)

cat T0968s2.fasta 
>T0968s2
MFIENKPGEIELLSFFESEPVSFERDNISFLYTAKNKCGLSVDFSFSVVEGWIQYTVRLHENEILHNSIDGVSSFSIRNDNLGDYIYAEIITKELINKIEIRIRPDIKIKSSSVIR

BASIC COMMAND
=============
hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -cpu 1

     OPTIONS       output   No_of_seqs      Neff    Correct?
=============================================================
-n 1 -oa3m t.a3m   stdout   83 out of 88    5.31538    Y 
-n 1 -oa3m t.a3m   t.hhr    1 out of 1      1          N
-n 2 -oa3m t.a3m   stdout   234 out of 245  8.02104    Y
-n 2 -oa3m t.a3m   t.hhr    83 out of 88    5.31538    N
-n 1               stdout   1 out of 1      1          N
-n 1               t.hhr    1 out of 1      1          N
-n 2               stdout   83 out of 88    5.31538    N
-n 2               t.hhr    83 out of 88    5.31538    N

(see more complete output below)

HH-suite Output (for bugs)

BASIC COMMAND
=============
hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -cpu 1

-n 1 -oa3m t.a3m
================
stdout (No_of_seqs & Neff : correct)
------------------------------------
- 21:26:38.689 INFO: 4 sequences belonging to 4 database HMMs found with an E-value < 0.001
- 21:26:38.689 INFO: Number of effective sequences of resulting query HMM: Neff = 5.31538
Query         T0968s2
Match_columns 116
No_of_seqs    83 out of 88
Neff          5.31538
Searched_HMMs 100
Date          Wed Jul  8 21:26:38 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 1 -cpu 1 -oa3m t.a3m
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 3.8E-42 2.2E-47  249.4   0.0  116    1-116     1-116 (116)
 82 tr|E4ZJN2|E4ZJN2_LEPMJ Predict  20.1 3.6E+02  0.0011   29.6   0.0   63   25-87    164-228 (646)

t.hhr (No_of_seqs & Neff : WRONG)
---------------------------------
Query         T0968s2
Match_columns 116
No_of_seqs    1 out of 1
Neff          1
Searched_HMMs 100
Date          Wed Jul  8 21:29:10 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 1 -cpu 1 -oa3m t.a3m
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 3.8E-42 2.2E-47  249.4   0.0  116    1-116     1-116 (116)
 82 tr|E4ZJN2|E4ZJN2_LEPMJ Predict  20.1 3.6E+02  0.0011   29.6   0.0   63   25-87    164-228 (646)

-n 2 -oa3m t.a3m
================
stdout (No_of_seqs & Neff : correct)
------------------------------------
- 21:32:56.949 INFO: 30 sequences belonging to 30 database HMMs found with an E-value < 0.001
- 21:32:56.949 INFO: Number of effective sequences of resulting query HMM: Neff = 8.02104
Query         T0968s2
Match_columns 116
No_of_seqs    234 out of 245
Neff          8.02104
Searched_HMMs 182
Date          Wed Jul  8 21:32:56 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 2 -cpu 1 -oa3m t.a3m
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 2.7E-43 1.2E-48  255.4   0.0  116    1-116     1-116 (116)
 81 tr|A0A2P4H383|A0A2P4H383_QUESU  20.7 3.4E+02  0.0011   31.3   0.0   89   20-108   177-267 (1794)

t.hhr (No_of_seqs & Neff : WRONG)
---------------------------------
Query         T0968s2
Match_columns 116
No_of_seqs    83 out of 88
Neff          5.31538
Searched_HMMs 182
Date          Wed Jul  8 21:32:56 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 2 -cpu 1 -oa3m t.a3m
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 2.7E-43 1.2E-48  255.4   0.0  116    1-116     1-116 (116)

-n 1
====
stdout (No_of_seqs & Neff : WRONG)
----------------------------------
- 21:36:32.487 INFO: 4 sequences belonging to 4 database HMMs found with an E-value < 0.001
Query         T0968s2
Match_columns 116
No_of_seqs    1 out of 1
Neff          1
Searched_HMMs 100
Date          Wed Jul  8 21:36:32 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 1 -cpu 1
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 3.8E-42 2.2E-47  249.4   0.0  116    1-116     1-116 (116)
 82 tr|E4ZJN2|E4ZJN2_LEPMJ Predict  20.1 3.6E+02  0.0011   29.6   0.0   63   25-87    164-228 (646)

t.hhr (No_of_seqs & Neff : WRONG)
---------------------------------
Query         T0968s2
Match_columns 116
No_of_seqs    1 out of 1
Neff          1
Searched_HMMs 100
Date          Wed Jul  8 21:36:32 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 1 -cpu 1
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 3.8E-42 2.2E-47  249.4   0.0  116    1-116     1-116 (116)
 82 tr|E4ZJN2|E4ZJN2_LEPMJ Predict  20.1 3.6E+02  0.0011   29.6   0.0   63   25-87    164-228 (646)

-n 2
====
stdout (No_of_seqs & Neff : WRONG)
----------------------------------
- 21:40:23.077 INFO: 30 sequences belonging to 30 database HMMs found with an E-value < 0.001
Query         T0968s2
Match_columns 116
No_of_seqs    83 out of 88
Neff          5.31538
Searched_HMMs 182
Date          Wed Jul  8 21:40:23 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 2 -cpu 1
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 2.7E-43 1.2E-48  255.4   0.0  116    1-116     1-116 (116)
 81 tr|A0A2P4H383|A0A2P4H383_QUESU  20.7 3.4E+02  0.0011   31.3   0.0   89   20-108   177-267 (1794)

t.hhr (No_of_seqs & Neff : WRONG)
---------------------------------
Query         T0968s2
Match_columns 116
No_of_seqs    83 out of 88
Neff          5.31538
Searched_HMMs 182
Date          Wed Jul  8 21:40:23 2020
Command       hhblits -i T0968s2.fasta -o t.hhr -e 0.001 -d /database/uniclust30_2018_08/uniclust30_2018_08 -n 2 -cpu 1
 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 tr|A0A1G5RK60|A0A1G5RK60_PHOLU 100.0 2.7E-43 1.2E-48  255.4   0.0  116    1-116     1-116 (116)
 81 tr|A0A2P4H383|A0A2P4H383_QUESU  20.7 3.4E+02  0.0011   31.3   0.0   89   20-108   177-267 (1794)

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

$ uname -a
Linux neo5 5.7.7-arch1-1 #1 SMP PREEMPT Wed, 01 Jul 2020 14:53:16 +0000 x86_64 GNU/Linux

CPU:
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

$ free
              total        used        free      shared  buff/cache   available
Mem:       32670448     7335908     6905316     1183996    18429224    23678560
Swap:             0           0           0

Version of hh-suite tested :

  • 3.2.0 (Arch AUR version : uses https://github.com/soedinglab/hh-suite/archive/v3.2.0.tar.gz)
  • also tested hhblits from compile of 3.2.0 master (2020-07-08) and 3.1.0 release (called these hhblits with full path, although if other hhsuite programs/scripts were called by hhblits, it likely used the system-wide hhsuite programs/suites)

smg3d avatar Jul 09 '20 03:07 smg3d