oq-engine icon indicating copy to clipboard operation
oq-engine copied to clipboard

Suggest the user how to reduce the logic tree for a site-specific analysis

Open micheles opened this issue 3 years ago • 1 comments

Our Canadian friends wants to run risk calculations on Vancouver and want to know how many sample should they take from the 21,000+ realizations of the full model. This is currently hard to guess and involves running a lot of very slow calculations to manually check the stability of the results. We could instead run a classical calculation on the interesting site with full enumeration (if possible, otherwise with a lot of samples) and then call a view

oq show clusterize_hcurves:<k>

that would collect together similar hazard curves in clusters(using scipy.cluster.vq.kmeans2) and would print a representative for each cluster. A possible syntax could be the following, for a case with 2187 realizations (1 source model, 7 TRTs of 3 GMPEs each, 3^7=2187) reduced to 9 clusters, assuming 5 TRTs are not relevant:

0~0[345][678]9[CDE][FGH][IJK]
0~2[345][678]A[CDE][FGH][IJK]
0~1[345][678]B[CDE][FGH][IJK]
0~1[345][678]9[CDE][FGH][IJK]
0~0[345][678]B[CDE][FGH][IJK]
0~2[345][678]B[CDE][FGH][IJK]
0~2[345][678]9[CDE][FGH][IJK]
0~1[345][678]A[CDE][FGH][IJK]
0~0[345][678]A[CDE][FGH][IJK]

We already have a view to connect one-letter abbreviations to the branch IDs:

$ oq show branch_ids
| logic_tree      | abbrev | branch_id |
|-----------------+--------+-----------|
| source_model_lt | 0      | b1        |
| gsim_lt         | 0      | b31       |
| gsim_lt         | 1      | b32       |
| gsim_lt         | 2      | b33       |
| gsim_lt         | 3      | b11       |
| gsim_lt         | 4      | b12       |
| gsim_lt         | 5      | b13       |
| gsim_lt         | 6      | b61       |
| gsim_lt         | 7      | b62       |
| gsim_lt         | 8      | b63       |
| gsim_lt         | 9      | b71       |
| gsim_lt         | A      | b72       |
| gsim_lt         | B      | b73       |
| gsim_lt         | C      | b21       |
| gsim_lt         | D      | b22       |
| gsim_lt         | E      | b23       |
| gsim_lt         | F      | b41       |
| gsim_lt         | G      | b42       |
| gsim_lt         | H      | b43       |
| gsim_lt         | I      | b51       |
| gsim_lt         | J      | b52       |
| gsim_lt         | K      | b53       |

Then it is possible to manually tweak the files source_model_logic_tree.xml and gsim_logic_tree.xml and reduce the logic tree to 9 realizations instead of 2187. Then the event_based_risk calculation can be run on the reduced logic tree.

micheles avatar Jun 15 '21 15:06 micheles

This is a good idea. We need to carefully think about the metric used to calculate distances (typically a key problem in cluster analysis). Also I would suggest to give the user the possibility to define a range of probabilities that can be used to extract a part of a hazard curve for the cluster analysis.

mmpagani avatar Jun 15 '21 15:06 mmpagani