hh-suite
hh-suite copied to clipboard
Parse hhr file
Hello, I am trying to get hhblits results in a data table format. I have seen a recommendation to use the -blasttab option. (see the closed issue: Script to parse HHR output file #120)
This one however doesn't output the probabilities. Is there a way to calculate the Prob. column from the output data as generated by -blasttab or alternatively is there a way to include the Probabilities in the table?
Best wishes
If still relevant: I wrote for myself an R parser for the hhr format - see function ?read.hhr. I hope it covers most of the variations in hhr. The output is a tibble that adheres to hhr as closely as possible and has the following fields:
| Name | Type |
|---|---|
| No | integer |
| Hit.ID | character |
| Hit.Description | character |
| Q.ss_pred | character |
| Q.query | character |
| Q.consensus | character |
| Q.Start | integer |
| Q.End | integer |
| Q.Length | integer |
| T.consensus | character |
| T.Start | integer |
| T.End | integer |
| T.Length | integer |
| T.hit | character |
| T.ss_dssp | character |
| T.ss_pred | character |
| Aligned_cols | integer |
| E.value | numeric |
| Identities | numeric |
| Probab | numeric |
| Score | numeric |
| Similarity | numeric |
| Sum_probs | numeric |
| Template_Neff | numeric |
The metadata in the header are preserved as attributes:
| Name | Type |
|---|---|
| Query | character |
| Match_columns | integer |
| No_of_seqs | character |
| Neff | integer |
| Searched_HMMs | integer |
| Date | character |
| Command | character |
The R Parser is certainly useful, and I have written a similar one myself, but I still think adding the "probability" column to the -blasttab output would be really useful for a lot of people. If the tireless developers of HHSuite found time to do it, it would be greatly appreciated (at the very least by this heavy user of HHsuite!)