OasisLMF icon indicating copy to clipboard operation
OasisLMF copied to clipboard

investigate and implement alternative sampling strategies to allow for better performance and convergence

Open johcarter opened this issue 4 years ago • 2 comments

Issue Description

investigate and implement alternative sampling strategies to allow for better performance and convergence (i.e. lower sample numbers) Nazare Workflow_Id 1000 Task_Id 1004

In the 'Sampling Strategies and Convergence' paper written by OasisLMF in 2013 we have looked at stratified and antithetic sampling techniques, not just simple random sampling (brute force). We discovered that although they offer potential reductions in numbers of samples required, with greatest benefit for stratified antithetic, the benefit depends upon the probability distribution and quantile. In particular, we have examined an empirical distribution, a Bernoulli (end points) distribution, a beta distribution and a uniform distribution - a uniform distribution benefits the most and the Bernoulli distribution benefits the least. The sampling for high-end quantiles benefits relatively least from stratified sampling. To deal with this variability, we recommend that Oasis should be run with different strategies to determine the most efficient sampling strategy for the problem in hand. As a rule of thumb, though, we suggest stratified antithetic sampling as the most efficient method (of those examined by us) with the number of strata depending on the quantile of interest if quantiles are the metric of interest.

OasisSamplingStrategiesAndConvergence_v1.pdf

The zip contains a spreadsheet demonstrating how to generate stratified, antithetic and stratified antithetic random numbers from simple random numbers.

Oasis Sampling Strategies v2.zip

Suggested first steps

  • [ ] Test convergence rates on a real model using a random number file containing the random numbers under the 3 strategies
  • [ ] Write python prototype code generating the stratified and antithetic random numbers

Example data / logs

johcarter avatar Oct 05 '21 13:10 johcarter

Following discussions in the Sampling Subgroup meetings, we have investigated Latin Hypercube Sampling and Sobol Sequences, which are well established stratified sampling strategies in one or more dimensions.

The tests we have done are;

  1. computing time taken of drawing random numbers under LHS/Sobol compared with simple uniform random samples. the difference appears to be negligible and scales with increasing sample size.
  2. how well the samples from a cdf reproduce the original distribution under LHS/Sobol compared with simple random samples, for a variety of parameterised distributions, including ones with low chance of loss. we quantify the sampling error and find it to be much lower for LHS/Sobol compared with simple uniform for the same number of samples.
  3. tested improvement in convergence of AAL under Sobol sequences compared with simple random for a large set of damage cdfs from a real model for the same number of samples.

There appears to be clear benefits of improved convergence characteristics using Sobol sequences/LHS compared with simple random sampling, with negligible performance difference in the drawing of random numbers across the sampling strategies.

Number of dimensions to sample across

For these strategies, the options are to sample in one dimension, across samples for each event and coverage where the user selects the total number of samples, or in two dimensions, across the samples and across the coverages for a given event. Although two dimensional sampling may result in the fastest convergence, it is not possible to separately seed each coverage because a whole array of random numbers is generated from a single seed for each event. This means we cannot achieve repeatability of losses for the same location which is part of two different analyses. Therefore single dimensional stratification would seem to be the best choice in consideration of the repeatability requirement.

Final decision to be made is between Sobol and LHS strategy.

johcarter avatar Feb 11 '22 14:02 johcarter

The sampling strategies have been investigated in this Jupyter notebook (v0.6):

mtazzari avatar Feb 11 '22 16:02 mtazzari