oq-engine
oq-engine copied to clipboard
Sensitivity analysis
This should have been implemented 10 years ago, but better late than never. The idea is to make the engine able to run sets of calculations with different values of one or more parameters
NB: @mmpagani wants to have the parameters in the job.ini, not in the command line, for better reproducibility. Here is an example that will produce 2x3=6 calculations:
[calculation]
sensitivity_analysis = {
'area_source_discretization': [10, 20],
'maximum_distance': [200, 300, 400]}
[2020-10-12 06:12:05 #3611 INFO] Job with {'area_source_discretization': 10, 'maximum_distance': 200}
[2020-10-12 06:12:05 #3612 INFO] Job with {'area_source_discretization': 10, 'maximum_distance': 300}
[2020-10-12 06:12:05 #3613 INFO] Job with {'area_source_discretization': 10, 'maximum_distance': 400}
[2020-10-12 06:12:05 #3614 INFO] Job with {'area_source_discretization': 20, 'maximum_distance': 200}
[2020-10-12 06:12:05 #3615 INFO] Job with {'area_source_discretization': 20, 'maximum_distance': 300}
[2020-10-12 06:12:05 #3616 INFO] Job with {'area_source_discretization': 20, 'maximum_distance': 400}
Some ideas for more flexibility around the syntax, borrowed directly from Caliban:
Perhaps it might make sense to provide a keyword argument called --sensitivity-config-file
or --experiment-config-file
that accepts a path to a TOML file, say Sensitivity_Analysis.toml on the local machine. That way the sensitivity experiments can be kept sandboxed and separate from the main job file. Values for any parameters provided in this sensitivity analysis config file would supersede any values for the same parameters provided in the main job file.
Sensitivity_Analysis.toml Basic Format
maximum_distance = [ 300, 500 ]
area_source_discretization = [ 5, 10 ]
truncation_level = 3
ignore_covs = [ true, false ]
For this particular sensitivity analysis input file, the engine would create and run 8 different jobs with the following combinations of parameters, one combination for each job:
maximum_distance |
area_source_discretization |
truncation_level |
ignore_covs |
---|---|---|---|
maximum_distance = 300 |
area_source_discretization = 5 |
truncation_level = 3 |
ignore_covs = true |
maximum_distance = 300 |
area_source_discretization = 5 |
truncation_level = 3 |
ignore_covs = false |
maximum_distance = 300 |
area_source_discretization = 10 |
truncation_level = 3 |
ignore_covs = true |
maximum_distance = 300 |
area_source_discretization = 10 |
truncation_level = 3 |
ignore_covs = false |
maximum_distance = 500 |
area_source_discretization = 5 |
truncation_level = 3 |
ignore_covs = true |
maximum_distance = 500 |
area_source_discretization = 5 |
truncation_level = 3 |
ignore_covs = false |
maximum_distance = 500 |
area_source_discretization = 10 |
truncation_level = 3 |
ignore_covs = true |
maximum_distance = 500 |
area_source_discretization = 10 |
truncation_level = 3 |
ignore_covs = false |
Rules for value expansion in the basic case:
-
int
,float
, andstring
values are passed on to every job untouched (truncation_level = 3
in the above example) - Lists spawn multiple jobs. The engine would take the Cartesian product of all list-type values and generate a job for each combination. The three lists of length 2 in the above example result in 8 total jobs; one for each possible combination of items from each list.
Lists of Sensitivity Analysis Experiments
The user might define a LIST of sensitivity analysis experiments in the Sensitivity_Analysis.toml using an array of tables; in this case, the engine would expand each entry in the list recursively. This would make it possible to generate experiment configs that aren’t strict Cartesian products.
[[analyses]]
maximum_distance = [ 300, 500 ]
area_source_discretization = [ 5, 10, 15 ]
truncation_level = 3
ignore_covs = [ true, false ]
[[analyses]]
maximum_distance = [ 300, 500 ]
area_source_discretization = 10
truncation_level = [ 2, 4 ]
[[analyses]]
maximum_distance = 1_000
area_source_discretization = 1
truncation_level = 5
ignore_covs = false
This config file would generate:
- 12 combinations for the first table;
- 4 combinations for the second table;
- 1 static combo for the third entry
for a total of 17 jobs. The user could pass the keyword argument --dry-run
(see below) to the engine to see what jobs will be generated for some sensitivity config file, or to validate that it’s well-formed at all.
Compound Keys
By default, an experiment specification in which multiple values are lists will be expanded using a Cartesian product, as described above. If the user wishes to have multiple arguments vary in concert, they can use a compound key. For example, the following (without compound keys) experiment config file will result in four jobs total:
maximum_distance = [ 300, 500 ]
area_source_discretization = [ 5, 10 ]
Results in:
maximum_distance |
area_source_discretization |
---|---|
maximum_distance = 300 |
area_source_discretization = 5 |
maximum_distance = 300 |
area_source_discretization = 10 |
maximum_distance = 500 |
area_source_discretization = 5 |
maximum_distance = 500 |
area_source_discretization = 10 |
To tie particular values of maximum_distance
and area_source_discretization
together, the user could specify them using a compound key:
"[ maximum_distance, area_source_discretization ]" = [ [ 300, 10 ], [ 500, 5 ] ]
This will result in only two jobs:
maximum_distance |
area_source_discretization |
---|---|
maximum_distance = 300 |
area_source_discretization = 10 |
maximum_distance = 500 |
area_source_discretization = 5 |
Note that compound keys are not available in pure TOML, thus the quotes around the list of keys in the above example, which will need parsing.
The --dry-run
Keyword Argument
Passing a --sensitivity-config-file
to the engine could potentially submit many, many jobs. To verify that there are no unforeseen errors and that the engine is submitting the number of jobs that the user expects, the user can test the setup by passing the --dry-run
flag to the engine rather than the usual --run
command, like this:
oq engine --dry-run --sensitivity-config-file ~/path/to/sensitivity_config.toml job.ini
--dry-run
will trigger all of the logging side effects that the user would see on job submission, so the user can verify that all of the settings are correct. This command will skip any actual calculation and post-processing phases, so it will return immediately with no side effects other than the logging.
Once the user is sure that the jobs that will be generated look good and pass all validations, they can remove the --dry-run
flag and use the --run
flag to submit all jobs.
Ref: https://caliban.readthedocs.io/en/latest/explore/experiment_broadcasting.html
I like very much the idea of keeping separate the file with the information on sensitivity analysis and the job.ini.
Too much features, Anirudh! Let's see how it goes as it is now. I see the point of using a separate file, though.
Let's close this since I have no plan to add more features for the near feature.