oq-engine icon indicating copy to clipboard operation
oq-engine copied to clipboard

Sensitivity analysis

Open micheles opened this issue 4 years ago • 4 comments

This should have been implemented 10 years ago, but better late than never. The idea is to make the engine able to run sets of calculations with different values of one or more parameters

micheles avatar Oct 09 '20 06:10 micheles

NB: @mmpagani wants to have the parameters in the job.ini, not in the command line, for better reproducibility. Here is an example that will produce 2x3=6 calculations:

[calculation]
sensitivity_analysis = {
  'area_source_discretization': [10, 20],
  'maximum_distance': [200, 300, 400]}

[2020-10-12 06:12:05 #3611 INFO] Job with {'area_source_discretization': 10, 'maximum_distance': 200}
[2020-10-12 06:12:05 #3612 INFO] Job with {'area_source_discretization': 10, 'maximum_distance': 300}
[2020-10-12 06:12:05 #3613 INFO] Job with {'area_source_discretization': 10, 'maximum_distance': 400}
[2020-10-12 06:12:05 #3614 INFO] Job with {'area_source_discretization': 20, 'maximum_distance': 200}
[2020-10-12 06:12:05 #3615 INFO] Job with {'area_source_discretization': 20, 'maximum_distance': 300}
[2020-10-12 06:12:05 #3616 INFO] Job with {'area_source_discretization': 20, 'maximum_distance': 400}

micheles avatar Oct 09 '20 07:10 micheles

Some ideas for more flexibility around the syntax, borrowed directly from Caliban:

Perhaps it might make sense to provide a keyword argument called --sensitivity-config-file or --experiment-config-file that accepts a path to a TOML file, say Sensitivity_Analysis.toml on the local machine. That way the sensitivity experiments can be kept sandboxed and separate from the main job file. Values for any parameters provided in this sensitivity analysis config file would supersede any values for the same parameters provided in the main job file.

Sensitivity_Analysis.toml Basic Format

maximum_distance = [ 300, 500 ]
area_source_discretization = [ 5, 10 ]
truncation_level = 3
ignore_covs = [ true, false ]

For this particular sensitivity analysis input file, the engine would create and run 8 different jobs with the following combinations of parameters, one combination for each job:

maximum_distance area_source_discretization truncation_level ignore_covs
maximum_distance = 300 area_source_discretization = 5 truncation_level = 3 ignore_covs = true
maximum_distance = 300 area_source_discretization = 5 truncation_level = 3 ignore_covs = false
maximum_distance = 300 area_source_discretization = 10 truncation_level = 3 ignore_covs = true
maximum_distance = 300 area_source_discretization = 10 truncation_level = 3 ignore_covs = false
maximum_distance = 500 area_source_discretization = 5 truncation_level = 3 ignore_covs = true
maximum_distance = 500 area_source_discretization = 5 truncation_level = 3 ignore_covs = false
maximum_distance = 500 area_source_discretization = 10 truncation_level = 3 ignore_covs = true
maximum_distance = 500 area_source_discretization = 10 truncation_level = 3 ignore_covs = false

Rules for value expansion in the basic case:

  • int, float, and string values are passed on to every job untouched (truncation_level = 3 in the above example)
  • Lists spawn multiple jobs. The engine would take the Cartesian product of all list-type values and generate a job for each combination. The three lists of length 2 in the above example result in 8 total jobs; one for each possible combination of items from each list.

Lists of Sensitivity Analysis Experiments

The user might define a LIST of sensitivity analysis experiments in the Sensitivity_Analysis.toml using an array of tables; in this case, the engine would expand each entry in the list recursively. This would make it possible to generate experiment configs that aren’t strict Cartesian products.

[[analyses]]
maximum_distance = [ 300, 500 ]
area_source_discretization = [ 5, 10, 15 ]
truncation_level = 3
ignore_covs = [ true, false ]

[[analyses]]
maximum_distance = [ 300, 500 ]
area_source_discretization = 10
truncation_level = [ 2, 4 ]

[[analyses]]
maximum_distance = 1_000
area_source_discretization = 1
truncation_level = 5
ignore_covs = false

This config file would generate:

  • 12 combinations for the first table;
  • 4 combinations for the second table;
  • 1 static combo for the third entry

for a total of 17 jobs. The user could pass the keyword argument --dry-run (see below) to the engine to see what jobs will be generated for some sensitivity config file, or to validate that it’s well-formed at all.

Compound Keys

By default, an experiment specification in which multiple values are lists will be expanded using a Cartesian product, as described above. If the user wishes to have multiple arguments vary in concert, they can use a compound key. For example, the following (without compound keys) experiment config file will result in four jobs total:

maximum_distance = [ 300, 500 ]
area_source_discretization = [ 5, 10 ]

Results in:

maximum_distance area_source_discretization
maximum_distance = 300 area_source_discretization = 5
maximum_distance = 300 area_source_discretization = 10
maximum_distance = 500 area_source_discretization = 5
maximum_distance = 500 area_source_discretization = 10

To tie particular values of maximum_distance and area_source_discretization together, the user could specify them using a compound key:

"[ maximum_distance, area_source_discretization ]" = [ [ 300, 10 ], [ 500, 5 ] ]

This will result in only two jobs:

maximum_distance area_source_discretization
maximum_distance = 300 area_source_discretization = 10
maximum_distance = 500 area_source_discretization = 5

Note that compound keys are not available in pure TOML, thus the quotes around the list of keys in the above example, which will need parsing.

The --dry-run Keyword Argument

Passing a --sensitivity-config-file to the engine could potentially submit many, many jobs. To verify that there are no unforeseen errors and that the engine is submitting the number of jobs that the user expects, the user can test the setup by passing the --dry-run flag to the engine rather than the usual --run command, like this:

oq engine --dry-run --sensitivity-config-file ~/path/to/sensitivity_config.toml job.ini

--dry-run will trigger all of the logging side effects that the user would see on job submission, so the user can verify that all of the settings are correct. This command will skip any actual calculation and post-processing phases, so it will return immediately with no side effects other than the logging.

Once the user is sure that the jobs that will be generated look good and pass all validations, they can remove the --dry-run flag and use the --run flag to submit all jobs.

Ref: https://caliban.readthedocs.io/en/latest/explore/experiment_broadcasting.html

raoanirudh avatar Oct 24 '20 10:10 raoanirudh

I like very much the idea of keeping separate the file with the information on sensitivity analysis and the job.ini.

mmpagani avatar Oct 26 '20 07:10 mmpagani

Too much features, Anirudh! Let's see how it goes as it is now. I see the point of using a separate file, though.

micheles avatar Oct 28 '20 07:10 micheles

Let's close this since I have no plan to add more features for the near feature.

micheles avatar Jan 24 '23 05:01 micheles