openturns HSIC : p-values permutation factorization

HSIC : p-values permutation factorization

Open JPelamatti opened this issue 2 years ago • 2 comments

This PR aims at factorizing part of the computations necessary to estimate the permutation-based HSIC p-values

May 17 '22 09:05 JPelamatti

recompile debian package: https://wiki.debian.org/BuildingTutorial#Get_the_source_package

Sep 15 '22 08:09 jschueller

Below are some useful links :

Sep 15 '22 08:09 mbaudin47

I found many more or less important ways to speed-up things with respect to your code. I started to implement it as a complement of your branch, but as I am really dumb with git I lost many things in the way, so I restarted a branch in my own repository... Here are the tricks, in increasing order of impact on the p-value computation by permutations:

use std::accumulate to sum all the elements of a Matrix instead of a loop
don't divide matrices twice by scalars, but divide once by their product
don't store all the permuted versions of objects before to aggregate them, but create them and aggregate them on the fly
don't discretize a permuted sample, but permute the discretization of the initial sample
reorder the loops with the outer loop being the loop over the permutations, then parallelize the outer loop

This way, you transform the 18.3s to get the p-values by permutations (master) into a 2.3s (my branch) with 8 cores. I used this script for the benchmark:

import openturns as ot
from openturns.usecases.wingweight_function import WingWeightModel
from time import time

NREPS = 10

# To test the effect of TBB
#ot.TBB.Disable()

ot.Log.Show(ot.Log.NONE)
m = WingWeightModel()

# %%
# HSIC indices
# ------------

# %%
# We then estimate the HSIC indices using a data-driven approach.
sizeHSIC = 500
inputDesignHSIC = m.distributionX.getSample(sizeHSIC)
outputDesignHSIC = m.model(inputDesignHSIC)

covarianceModelCollection = []

# %%
for i in range(m.dim):
    Xi = inputDesignHSIC.getMarginal(i)
    inputCovariance = ot.SquaredExponential(1)
    inputCovariance.setScale(Xi.computeStandardDeviation())
    covarianceModelCollection.append(inputCovariance)

# %%
# We define a covariance kernel associated to the output variable.
outputCovariance = ot.SquaredExponential(1)
outputCovariance.setScale(outputDesignHSIC.computeStandardDeviation())
covarianceModelCollection.append(outputCovariance)

# %%
# In this paragraph, we perform the analysis on the raw data: that is
# the global HSIC estimator.
estimatorType = ot.HSICUStat()

# %%
# We now build the HSIC estimator:


t00 = time()
globHSIC = ot.HSICEstimatorGlobalSensitivity(
    covarianceModelCollection, inputDesignHSIC, outputDesignHSIC, estimatorType
)

# %%
# We get the R2-HSIC indices:
R2HSICIndices = globHSIC.getR2HSICIndices()
print("\n Global HSIC analysis")
print("R2-HSIC Indices: ", R2HSICIndices)

# %%
# and the HSIC indices:
HSICIndices = globHSIC.getHSICIndices()
print("HSIC Indices: ", HSICIndices)

# %%
# The p-value by permutation, sequential
t0 = time()
for rep in range(NREPS):
    ot.RandomGenerator.SetSeed(0)
    globHSIC.setPermutationSize(globHSIC.getPermutationSize())
    pvperm = globHSIC.getPValuesPermutation()
t1 = time()
print("p-value (permutation, seq):", pvperm)
print("t (perm)=", (t1 - t0) / NREPS, "s")


# %%
# The p-value by permutation, parallel
try:
    t0 = time()
    for rep in range(NREPS):
        ot.RandomGenerator.SetSeed(0)
        globHSIC.setPermutationSize(globHSIC.getPermutationSize())
        pvperm = globHSIC.getPValuesPermutationParallel()
    t1 = time()
    print("p-value (permutation, par):", pvperm)
    print("t (perm)=", (t1 - t0) / NREPS, "s")
except:
    pass

# %%
# We have an asymptotic estimate of the value for this estimator.
pvas = globHSIC.getPValuesAsymptotic()
print("p-value (asymptotic): ", pvas)
t1 = time()
print("t (total)=", t1 - t00, "s")

Many other optimizations could be done, such as:

Provide specialized algorithms for scalar matrix weights (HSICEstimatorGlobalSensitivity, HSICEstimatorTargetSensitivity), diagonal matrix weights (HSICEstimatorConditionalSensitivity) and symmetrical matrix weights (are there non symmetrical weights on the Earth?)
Reorder some linear algebra
Other things I will find using flamegraphs on the new implementation

Jan 14 '23 15:01 regislebrun

openturns openturns copied to clipboard

HSIC : p-values permutation factorization

openturns
openturns copied to clipboard