hh-suite icon indicating copy to clipboard operation
hh-suite copied to clipboard

issue about building local databse

Open mujiezhang opened this issue 3 years ago • 27 comments

I build the databse from MSAs, first I place all of them in a single folder that does not contain any other files to create a single FFindex database and general two files: 227_msa.ffdata and 227_msa.ffindex, then I yse the command 'OMP_NUM_THREADS=1 mpirun -np 1 ffindex_apply_mpi 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0' and I got an error like this:

'mpirun was unable to find the specified executable file, and therefore did not launch the job. This error was first reported for process rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpirun command line parameter option (remember that mpirun interprets the first unrecognized command line token as the executable).

Node: localhost Executable: ffindex_apply_mpi'

So I wonder how to solve this problem

mujiezhang avatar May 24 '21 11:05 mujiezhang

Do you have mpirun, ffindex_apply_mpi and hhconsensus in your PATH? Check output of these commands:

which mpirun
which ffindex_apply_mpi 
which hhconsensus

ksteczk avatar May 24 '21 11:05 ksteczk

I install the mpirun just now. and I do not find the ffindex_apply_mpi. I install hhsuite through conda, So this problem occur sometimes if hhusite installed through conda? How can I get ffindex_apply_mpi?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 19:44 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Do you have mpirun, ffindex_apply_mpi and hhconsensus in your PATH? Check output of these commands: which mpirun which ffindex_apply_mpi which hhconsensus — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 11:05 mujiezhang

You should use full path to ffindex_apply_mpi binary. I don't know where might that be in conda...

By the way - when you are using -np 1 option in mpirun consider skipping mpi at all and just go for: ffindex_apply 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0' it does the same

ksteczk avatar May 24 '21 11:05 ksteczk

Oh, thank you very much! You are so nice! The problem is solved. But I have another small question. I have several groups of proteins, and I want to find out whether a group is similar to another group. Now, I make local hhsuite database of these protein groups and do hhsearch using protein groups one by one against the database,.Am I right?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 19:55 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

You should use full path to ffindex_apply_mpi binary. I don't know where might that be in conda... By the way - when you are using -np 1 option in mpirun consider skipping mpi at all and just go for: ffindex_apply 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0' it does the same — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 12:05 mujiezhang

Suppose you have files: querydb_hhm.ffdata querydb_hhm.ffindex dbToBeSearched_hhm.ffdata dbToBeSearched_hhm.ffindex

You can run: ffindex_apply querydb_hhm.ffdata querydb_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d dbToBeSearched and this will generate 3rd database "mappings" with the results

ksteczk avatar May 24 '21 12:05 ksteczk

Sorry for my ignorance… I run the command ‘ffindex_apply 227_msa.ff{data,index} -i 227_a3m_wo_ss.ffindex -d 227_a3m_wo_ss.ffdata -- hhconsensus -M 50 -maxres 65535 -i stdin -oa3m stdout -v 0’ and it is right. Then I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 27_a3m.ffdata -- addss.pl -v 0 stdin stdout’ and it is right. But when I want to generate the hhm file using command ‘ffindex_apply 227_a3m.ff{data,index} -i 227_hhm.ffindex -d 227_hhm.ffdata -- hhmake -i stdin -o stdout -v 0’ ,I got lots of errors like ‘97.txt_muscle.msa 224 1 286 4

  • 20:28:58.692 ERROR: Error in /opt/conda/conda-bld/hhsuite_1598863433284/work/src/hhfunc.cpp:16: ReadQueryFile:

  • 20:28:58.692 ERROR: stdin is empty!

98.txt_muscle.msa 225 1 256 4

  • 20:28:58.983 ERROR: Error in /opt/conda/conda-bld/hhsuite_1598863433284/work/src/hhfunc.cpp:16: ReadQueryFile:

  • 20:28:58.983 ERROR: stdin is empty!’ So, I did not have the hhm.ffdata and hhm.ffindex files…

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 20:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Suppose you have files: querydb_hhm.ffdata querydb_hhm.ffindex dbToBeSearched_hhm.ffdata dbToBeSearched_hhm.ffindex You can run: ffindex_apply querydb_hhm.ffdata querydb_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d dbToBeSearched and this will generate 3rd database "mappings" with the results — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 12:05 mujiezhang

maybe something with 227_a3m.ff{data,index} files? You can see into the 227_a3m.ffdata file and check whether it contains anything. Another test is to run it without -v 0 option and see upon which db element it crashes.

ksteczk avatar May 24 '21 12:05 ksteczk

I have checked the 227_a3m.ffdata file, and it seems like a wrong file which contain ‘^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@’ But when I run the command ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 27_a3m.ffdata -- addss.pl -v 0 stdin stdout’, there is no wrong information…

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 20:48 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

maybe something with 227_a3m.ff{data,index} files? You can see into the 227_a3m.ffdata file and check whether it contains anything. Another test is to run it without -v 0 option and see upon which db element it crashes. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 13:05 mujiezhang

run addss.pl without -v 0

ksteczk avatar May 24 '21 13:05 ksteczk

 I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl stdin stdout’ and ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl’ and they generated the same results as ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl -v 0 stdin stdout’.  The formal space usage of a3m.ffdata file is usually larger than the msa.ffdata.But the 227_a3m.ffdata is only 227bytes. I do not know what wrong with it.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:02 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

run addss.pl without -v 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 13:05 mujiezhang

did you setup the paths in addss.pl script? it requires paths to psipred as far as I recall...

pon., 24 maj 2021 o 15:11 mujiezhang @.***> napisał(a):

I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl stdin stdout’ and ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl’ and they generated the same results as ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl -v 0 stdin stdout’. The formal space usage of a3m.ffdata file is usually larger than the msa.ffdata.But the 227_a3m.ffdata is only 227bytes. I do not know what wrong with it.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:02 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

run addss.pl without -v 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847032702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI2WHU63XM2Z2YPAZ6TTPJGAVANCNFSM45NBQOKQ .

ksteczk avatar May 24 '21 13:05 ksteczk

Maybe I can try to install the hhsuite through source. Anyway, thanks a lot and you are so patient with me. Thanks again!

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:22 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

did you setup the paths in addss.pl script? it requires paths to psipred as far as I recall...

pon., 24 maj 2021 o 15:11 mujiezhang @.***> napisał(a):

I run ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl stdin stdout’ and ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl’ and they generated the same results as ‘ffindex_apply 227_a3m_wo_ss.ff{data,index} -i 227_a3m.ffindex -d 227_a3m.ffdata -- addss.pl -v 0 stdin stdout’. The formal space usage of a3m.ffdata file is usually larger than the msa.ffdata.But the 227_a3m.ffdata is only 227bytes. I do not know what wrong with it.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:02 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

run addss.pl without -v 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847032702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI2WHU63XM2Z2YPAZ6TTPJGAVANCNFSM45NBQOKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 13:05 mujiezhang

It won't solve your problem - you have to configure psipred anyway - hhsuite uses that and it is an external tool to be connected to hhsuite.

ksteczk avatar May 24 '21 13:05 ksteczk

Oh! But I do not know how to configure psipred. Should I download it throuh conda ? 发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:31 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

It won't solve your problem - you have to configure psipred anyway - hhsuite uses that and it is an external tool to be connected to hhsuite. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 13:05 mujiezhang

First of all, you can easily skip ss prediction and go with non ss a3m. According to hhsuite documentation sensitivity increase is little unless you're going to play with parameters more deeply.

If you want to go for ss prediction anyway, you should install psipred or compile it from source, and edit HHPaths.pm in hhsuite scripts subdirectory to work with your local psipred installation.

ksteczk avatar May 24 '21 13:05 ksteczk

Thank you very much! Your advices are very useful! I am trying.

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:56 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

First of all, you can easily skip ss prediction and go with non ss a3m. According to hhsuite documentation sensitivity increase is little unless you're going to play with parameters more deeply. If you want to go for ss prediction anyway, you should install psipred or compile it from source, and edit HHPaths.pm in hhsuite scripts subdirectory to work with your local psipred installation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 13:05 mujiezhang

Another stupid question…how to skip ss prediction……

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 21:56 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

First of all, you can easily skip ss prediction and go with non ss a3m. According to hhsuite documentation sensitivity increase is little unless you're going to play with parameters more deeply. If you want to go for ss prediction anyway, you should install psipred or compile it from source, and edit HHPaths.pm in hhsuite scripts subdirectory to work with your local psipred installation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 24 '21 14:05 mujiezhang

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them.

ksteczk avatar May 24 '21 14:05 ksteczk

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like

‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 25 '21 02:05 mujiezhang

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ .

ksteczk avatar May 26 '21 07:05 ksteczk

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another.   What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

  And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 26 '21 07:05 mujiezhang

So your interpretation is that MH719189.1_prot_AYD80303.1_44 doesn't cluster with any other msa in the database.

śr., 26 maj 2021 o 09:52 mujiezhang @.***> napisał(a):

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another. What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848550908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JJ4K7M7DTEX5G4X3TPSSCXANCNFSM45NBQOKQ .

ksteczk avatar May 26 '21 07:05 ksteczk

Maybe I should show another picture to you. Now as you can see in the picture, The protein lcl | NC_019455.1_prot_YP_007002910.1_2 belonging to protein cluster A have two significant hit with prob>90, one is lcl | NC_018274.1_prot_YP_006560 belonging to protein cluster B and another is lcl | NC_005882.1_prot_YP_024689 belonging to protein cluster C. So I certainly know the lcl | NC_019455.1_prot_YP_007002910.1_2 is similar to the two hit. But what I am not sure is that whether cluster A are similar to cluster B and C. Can the query sequence represent the cluster it belongs to? 发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:54 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

So your interpretation is that MH719189.1_prot_AYD80303.1_44 doesn't cluster with any other msa in the database.

śr., 26 maj 2021 o 09:52 mujiezhang @.***> napisał(a):

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another. What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848550908, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMI4JJ4K7M7DTEX5G4X3TPSSCXANCNFSM45NBQOKQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mujiezhang avatar May 26 '21 08:05 mujiezhang

Didn't get any picture. Anyway - we can switch to regular e-mails with the discussion since the hhsuite problem was solved. Feel free to catch me on kamil dot steczkiewicz at gmail.com.

śr., 26 maj 2021 o 10:06 mujiezhang @.***> napisał(a):

Maybe I should show another picture to you. Now as you can see in the picture, The protein lcl | NC_019455.1_prot_YP_007002910.1_2 belonging to protein cluster A have two significant hit with prob>90, one is lcl | NC_018274.1_prot_YP_006560 belonging to protein cluster B and another is lcl | NC_005882.1_prot_YP_024689 belonging to protein cluster C. So I certainly know the lcl | NC_019455.1_prot_YP_007002910.1_2 is similar to the two hit. But what I am not sure is that whether cluster A are similar to cluster B and C. Can the query sequence represent the cluster it belongs to? 发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:54 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

So your interpretation is that MH719189.1_prot_AYD80303.1_44 doesn't cluster with any other msa in the database.

śr., 26 maj 2021 o 09:52 mujiezhang @.***> napisał(a):

I have 227 clusters of proteins. What I exactly want to do is to ensure which protein cluster are similar to another. What I have done are that I made alignment of every protein clusters and used them to make the hhsearch database as you told me before and the documents online. Then I want to compare the 227 clusters to themselves and I run the command ‘ffindex_apply 227_hhm.ffdata 227_hhm.ffindex -i mappings.ffindex -d mappings.ffdata -- hhsearch -i stdin -o stdout -d 227’

And I got the result file-mappings.ffdata which contains the hhsearch results. But as you can seen in the mappings.ffdata, I just could not understand the result clearly. Does the query represent the cluster it belongs to? For example, if the query sequence A belongs to cluster1, it has a very good hit of squences B belongs to cluster2, So can I say that the cluster1 are similar to cluster 2?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月26日 15:30 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

Assuming you are running hhsearch using ffindex_apply you get hhsearch mappings between all profiles (hmm/a3m) in your query database to profiles in the target database. You pasted a fragment of the result showing how MH719189.1_prot_AYD80303.1_44 compares to entries in the target database. As you see it is similar to itself and barely similar to the remaining objects in the target database. HHsearch reports minimum 10 hits even if they don't meet reliability thresholds criteria (and more hits if it finds more similar objects in the database).

What exactly do you want to do? Compare each sequence with each? You can assume that the remaining 217 sequences in the target database are not similar to the query.

wt., 25 maj 2021 o 04:13 mujiezhang @.***> napisał(a):

Thanks for your useful advices! And I got the final results. I have two more questions. The result file is something like ‘Query lcl|MH719189.1_prot_AYD80303.1_44 [locus_tag=Fc02_44] [protein=virion structural protein] [protein_id=AYD80303.1] [location=29314..30270] [gbkey=CDS] Match_columns 318 No_of_seqs 1 out of 4 Neff 1 Searched_HMMs 227 Date Tue May 25 09:46:24 2021 Command hhsearch -i stdin -o stdout -d 227 -cov 50 -qid 90

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 lcl|MH719189.1_prot_AYD80303.1 100.0 2E-196 9E-199 1295.5 0.0 318 1-318 1-318 (318) 2 lcl|NC_020198.1_prot_YP_007392 13.5 0.47 0.0021 24.6 0.0 25 1-25 30-54 (74) 3 lcl|JQ067085.2_prot_AII21881.1 10.9 0.64 0.0028 24.0 0.0 12 199-210 16-27 (80) 4 lcl|JX495042.1_prot_AFR52235.1 7.7 1 0.0045 23.9 0.0 18 274-291 69-86 (102) 5 lcl|KM233689.1_prot_AIM40317.1 2.9 3.5 0.015 20.2 0.0 39 6-44 12-61 (87) 6 lcl|NC_005882.1_prot_YP_024699 2.5 4.2 0.019 20.5 0.0 11 200-210 72-82 (110) 7 lcl|NC_020198.1_prot_YP_007392 1.7 6.5 0.029 17.9 0.0 13 7-19 51-63 (67) 8 lcl|NC_000929.1_prot_NP_050615 1.1 11 0.047 19.1 0.0 16 37-52 91-106 (176) 9 lcl|NC_028766.1_prot_YP_009196 1.1 11 0.049 18.8 0.0 13 125-137 142-154 (158) 10 lcl|NC_005882.1_prot_YP_024720 1.0 12 0.051 20.7 0.0 20 195-214 51-70 (401)’ The database is made of 227 msa files, and I want to know whether one msa is similar to another. But this result only tell me which squence is similar to another squence. How should I understand this result?

发送自 Windows 10 版邮件应用

发件人: Kamil 发送时间: 2021年5月24日 22:11 收件人: soedinglab/hh-suite 抄送: mujiezhang; Author 主题: Re: [soedinglab/hh-suite] issue about building local databse (#268)

To skip ss just rename a3m no_ss files into a3m ant that's all - build hmm profiles on them. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-847477997 , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AD2CMI4JGRRGUOAYJQPBKHDTPMBU5ANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848550908>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AD2CMI4JJ4K7M7DTEX5G4X3TPSSCXANCNFSM45NBQOKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-848559829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIYKRQ6ZKVIQ52SEVC3TPSTXRANCNFSM45NBQOKQ .

ksteczk avatar May 26 '21 08:05 ksteczk

Seems that the file is missing? Is it in the directory from which you're running the script? Are you running it locally on the same machine? Why there's error from mpirun? How exactly did you run this?

śr., 6 kwi 2022, 11:22 użytkownik chao @.***> napisał:

When I enter the following command: 'ffindex_apply cluster1091_a3m_wo_ss.ff{data,index} -i cluster1091_a3m.ffindex -d cluster1091_a3m.ffdata -- addss.pl stdin stdout /big/martin/hh-suite/lib/ffindex/src/ffindex_apply_mpi.c:341 ffindex_apply: cluster1091_a3m_wo_ss.ffdata: No such file or directory' there is such an error, how should I solve it, thank you

— Reply to this email directly, view it on GitHub https://github.com/soedinglab/hh-suite/issues/268#issuecomment-1090049770, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIY6YOMDQY3C6LCGY6LVDVJVHANCNFSM45NBQOKQ . You are receiving this because you commented.Message ID: @.***>

ksteczk avatar Apr 07 '22 16:04 ksteczk

Seems that the file is missing? Is it in the directory from which you're running the script? Are you running it locally on the same machine? Why there's error from mpirun? How exactly did you run this? śr., 6 kwi 2022, 11:22 użytkownik chao @.> napisał: When I enter the following command: 'ffindex_apply cluster1091_a3m_wo_ss.ff{data,index} -i cluster1091_a3m.ffindex -d cluster1091_a3m.ffdata -- addss.pl stdin stdout /big/martin/hh-suite/lib/ffindex/src/ffindex_apply_mpi.c:341 ffindex_apply: cluster1091_a3m_wo_ss.ffdata: No such file or directory' there is such an error, how should I solve it, thank you — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIY6YOMDQY3C6LCGY6LVDVJVHANCNFSM45NBQOKQ . You are receiving this because you commented.Message ID: @.>

Hi ksteczk when i run the hhsearch i meet the new issue "could not open file 'msa/HHM/Allterl_hmm_cs219.ffdata', In /big/martin/hh-suite/src/ffindexdatabase.cpp:11: FFindexDatabase:" firstly, i build the db from the all hmm file by ffiindex_build and i get the Allterl_hmm.ffdata and Allterl_hmm.ffindex file. then i query the single hmm. file to the the allter_hmm.ffindex file by hhsearch. but i meet this issue. so can you figure it out? guys. appreciated it ! yours

lonestarling avatar Apr 26 '22 08:04 lonestarling

Seems that the file is missing? Is it in the directory from which you're running the script? Are you running it locally on the same machine? Why there's error from mpirun? How exactly did you run this? śr., 6 kwi 2022, 11:22 użytkownik chao @.> napisał: When I enter the following command: 'ffindex_apply cluster1091_a3m_wo_ss.ff{data,index} -i cluster1091_a3m.ffindex -d cluster1091_a3m.ffdata -- addss.pl stdin stdout /big/martin/hh-suite/lib/ffindex/src/ffindex_apply_mpi.c:341 ffindex_apply: cluster1091_a3m_wo_ss.ffdata: No such file or directory' there is such an error, how should I solve it, thank you — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2CMIY6YOMDQY3C6LCGY6LVDVJVHANCNFSM45NBQOKQ . You are receiving this because you commented.Message ID: _@**.**_>

Hi ksteczk when i run the hhsearch i meet the new issue "could not open file 'msa/HHM/Allterl_hmm_cs219.ffdata', In /big/martin/hh-suite/src/ffindexdatabase.cpp:11: FFindexDatabase:" firstly, i build the db from the all hmm file by ffiindex_build and i get the Allterl_hmm.ffdata and Allterl_hmm.ffindex file. then i query the single hmm. file to the the allter_hmm.ffindex file by hhsearch. but i meet this issue. so can you figure it out? guys. appreciated it ! yours

I also encountered this problem, did you solve it?

shikingstar avatar Jan 06 '23 08:01 shikingstar