chaospy icon indicating copy to clipboard operation
chaospy copied to clipboard

Sobol index considering MvNormal distribution

Open goghino opened this issue 2 years ago • 3 comments

Describe your problem Hello, I am trying to compute the sensitivity of my model to parameters which are correlated. I use cp.MvNormal() to model the distribution of the parameters, using the Cholesky decorrelation algorithm to construct the orthogonal polynomials. But with this approach I get stuck at calling cp.Sens_m(uhat, joint) which requires joint to be stochastically independent.

Initial implementation Roughly, this is the code that I am using, failing at the last step calling cp.Sens_m()

mu = [0.05, 20]
cov = [[6.4e-05, 0.005], [0.005, 2.25]]
joint = cp.MvNormal(mu, cov)
expansion =  cp.expansion.cholesky(P, joint, normed=True)
nodes = joint.sample(N, rule="M")
data = model(nodes)
uhat = cp.fit_regression(expansion, nodes, data)
cp.Sens_m(uhat, joint)

Additional context I've also tried the approach with Rosenblatt transformation, but the 1st order Sobol indices I get seem to be wrong. The sum across the parameters is less than 1, while the total order is greater than 1 (somehow mirror image of the first order sums).

mu = [0.05, 20]
cov = [[6.4e-05, 0.005], [0.005, 2.25]]
joint = cp.MvNormal(mu, cov)
joint_U = cp.J(cp.Normal(), cp.Normal())
expansion =  cp.expansion.stieltjes(P, joint_U, normed=True)
nodes_U = joint_U.sample(N, rule="M")
nodes = joint.inv(joint_U.fwd(nodes_U))
data = model(nodes)
uhat = cp.fit_regression(expansion, nodes_U, data)
cp.Sens_m(uhat, joint_U)

Sobol_sum zone_21

Do you have any advice/recommendations what else should I try? How should I interpret the Sobol indices I've got with the Rosenblatt approach?

goghino avatar Nov 17 '21 15:11 goghino

So there are a few of things to note.

When it comes to sum smaller or larger than one, your results seems to be valid. Sum of main indices can be smaller than 1 and Sum of total indices can be larger. See wikipedia on Sobol indices.

Second, as a general rule of thumb, Sobol indices for dependent random variables is a hard problem. The basic formula breaks down as conditioning for dependent variables can quickly become paradoxes.

Third, Rosenblatt tranformation technique does not work with Sobol indices as you are now measuring the wrong thing. Sobol is a measure between input and output, but with the transformation you effectively replacing your input with new more managable ones.

So what are you left with then?

I think your best bet is Saltellis method (also described in the wikipedia article). In Chaospy these are available through cp.Sens_m_sample and cp.Sens_t_sample.

jonathf avatar Nov 17 '21 21:11 jonathf

Hi Jonathan,

I see your point about the sums, it completely makes sense!

Regarding your 3rd point, I have run an experiment, comparing 1st order Sobol indices obtained by Quasi-Monte Carlo using Saltelli method and PCE with the Rosenblatt transformation approach described above. Both approaches seem to produce the same result, expect at t=0. The model I've used is the Coffee cup, where the dependency between kappa and T_env parameters is introduced using cp.MvNormal(). The correlation between the variables is quite large, with Pearson coeff. 0.83. Could you please extend your argument why this 'Rosenblatt transformation technique does not work with Sobol indices'? Could you maybe point me to some more mathematical reasoning behind this?

SAerror_t_env.pdf SAerror_kappa.pdf

goghino avatar Nov 29 '21 19:11 goghino

Sorry the late reply. My kid's kindergarten has been hit by the omicron variant and we're stuck in quarentine.

Rosenblatt-transformation is in its essence a variable switch. Instead of looking at f(q) as a function of q, you are instead looking at f(T(r)) as a function of r. This works well in the PCE context because q = T(r), and r is typically a simpler (uncorrelated) variable compare to q.

Sobol indices is by its design an interaction between input and output. For f(q) this is straight forward, as it is obvious what is input and what is output. f(T(r)) however is not so simple. Yes, you can absolutly calculate this and get an answer, but the result is with respect to r, not q which is the original problem you want to solve for.

jonathf avatar Dec 07 '21 15:12 jonathf