oq-engine
oq-engine copied to clipboard
Suggest the user how to reduce the logic tree for a site-specific analysis
Our Canadian friends wants to run risk calculations on Vancouver and want to know how many sample should they take from the 21,000+ realizations of the full model. This is currently hard to guess and involves running a lot of very slow calculations to manually check the stability of the results. We could instead run a classical calculation on the interesting site with full enumeration (if possible, otherwise with a lot of samples) and then call a view
oq show clusterize_hcurves:<k>
that would collect together similar hazard curves in clusters(using scipy.cluster.vq.kmeans2
) and would print a representative for each cluster.
A possible syntax could be the following, for a case with 2187 realizations (1 source model, 7 TRTs of 3 GMPEs each, 3^7=2187) reduced to 9 clusters, assuming 5 TRTs are not relevant:
0~0[345][678]9[CDE][FGH][IJK]
0~2[345][678]A[CDE][FGH][IJK]
0~1[345][678]B[CDE][FGH][IJK]
0~1[345][678]9[CDE][FGH][IJK]
0~0[345][678]B[CDE][FGH][IJK]
0~2[345][678]B[CDE][FGH][IJK]
0~2[345][678]9[CDE][FGH][IJK]
0~1[345][678]A[CDE][FGH][IJK]
0~0[345][678]A[CDE][FGH][IJK]
We already have a view to connect one-letter abbreviations to the branch IDs:
$ oq show branch_ids
| logic_tree | abbrev | branch_id |
|-----------------+--------+-----------|
| source_model_lt | 0 | b1 |
| gsim_lt | 0 | b31 |
| gsim_lt | 1 | b32 |
| gsim_lt | 2 | b33 |
| gsim_lt | 3 | b11 |
| gsim_lt | 4 | b12 |
| gsim_lt | 5 | b13 |
| gsim_lt | 6 | b61 |
| gsim_lt | 7 | b62 |
| gsim_lt | 8 | b63 |
| gsim_lt | 9 | b71 |
| gsim_lt | A | b72 |
| gsim_lt | B | b73 |
| gsim_lt | C | b21 |
| gsim_lt | D | b22 |
| gsim_lt | E | b23 |
| gsim_lt | F | b41 |
| gsim_lt | G | b42 |
| gsim_lt | H | b43 |
| gsim_lt | I | b51 |
| gsim_lt | J | b52 |
| gsim_lt | K | b53 |
Then it is possible to manually tweak the files source_model_logic_tree.xml and gsim_logic_tree.xml and reduce the logic tree to 9 realizations instead of 2187. Then the event_based_risk calculation can be run on the reduced logic tree.
This is a good idea. We need to carefully think about the metric used to calculate distances (typically a key problem in cluster analysis). Also I would suggest to give the user the possibility to define a range of probabilities that can be used to extract a part of a hazard curve for the cluster analysis.