openturns icon indicating copy to clipboard operation
openturns copied to clipboard

HSIC : p-values permutation factorization

Open JPelamatti opened this issue 2 years ago • 2 comments

This PR aims at factorizing part of the computations necessary to estimate the permutation-based HSIC p-values

JPelamatti avatar May 17 '22 09:05 JPelamatti

recompile debian package: https://wiki.debian.org/BuildingTutorial#Get_the_source_package

jschueller avatar Sep 15 '22 08:09 jschueller

I found many more or less important ways to speed-up things with respect to your code. I started to implement it as a complement of your branch, but as I am really dumb with git I lost many things in the way, so I restarted a branch in my own repository... Here are the tricks, in increasing order of impact on the p-value computation by permutations:

  • use std::accumulate to sum all the elements of a Matrix instead of a loop
  • don't divide matrices twice by scalars, but divide once by their product
  • don't store all the permuted versions of objects before to aggregate them, but create them and aggregate them on the fly
  • don't discretize a permuted sample, but permute the discretization of the initial sample
  • reorder the loops with the outer loop being the loop over the permutations, then parallelize the outer loop

This way, you transform the 18.3s to get the p-values by permutations (master) into a 2.3s (my branch) with 8 cores. I used this script for the benchmark:

import openturns as ot
from openturns.usecases.wingweight_function import WingWeightModel
from time import time

NREPS = 10

# To test the effect of TBB
#ot.TBB.Disable()

ot.Log.Show(ot.Log.NONE)
m = WingWeightModel()

# %%
# HSIC indices
# ------------

# %%
# We then estimate the HSIC indices using a data-driven approach.
sizeHSIC = 500
inputDesignHSIC = m.distributionX.getSample(sizeHSIC)
outputDesignHSIC = m.model(inputDesignHSIC)

covarianceModelCollection = []

# %%
for i in range(m.dim):
    Xi = inputDesignHSIC.getMarginal(i)
    inputCovariance = ot.SquaredExponential(1)
    inputCovariance.setScale(Xi.computeStandardDeviation())
    covarianceModelCollection.append(inputCovariance)

# %%
# We define a covariance kernel associated to the output variable.
outputCovariance = ot.SquaredExponential(1)
outputCovariance.setScale(outputDesignHSIC.computeStandardDeviation())
covarianceModelCollection.append(outputCovariance)

# %%
# In this paragraph, we perform the analysis on the raw data: that is
# the global HSIC estimator.
estimatorType = ot.HSICUStat()

# %%
# We now build the HSIC estimator:


t00 = time()
globHSIC = ot.HSICEstimatorGlobalSensitivity(
    covarianceModelCollection, inputDesignHSIC, outputDesignHSIC, estimatorType
)

# %%
# We get the R2-HSIC indices:
R2HSICIndices = globHSIC.getR2HSICIndices()
print("\n Global HSIC analysis")
print("R2-HSIC Indices: ", R2HSICIndices)

# %%
# and the HSIC indices:
HSICIndices = globHSIC.getHSICIndices()
print("HSIC Indices: ", HSICIndices)

# %%
# The p-value by permutation, sequential
t0 = time()
for rep in range(NREPS):
    ot.RandomGenerator.SetSeed(0)
    globHSIC.setPermutationSize(globHSIC.getPermutationSize())
    pvperm = globHSIC.getPValuesPermutation()
t1 = time()
print("p-value (permutation, seq):", pvperm)
print("t (perm)=", (t1 - t0) / NREPS, "s")


# %%
# The p-value by permutation, parallel
try:
    t0 = time()
    for rep in range(NREPS):
        ot.RandomGenerator.SetSeed(0)
        globHSIC.setPermutationSize(globHSIC.getPermutationSize())
        pvperm = globHSIC.getPValuesPermutationParallel()
    t1 = time()
    print("p-value (permutation, par):", pvperm)
    print("t (perm)=", (t1 - t0) / NREPS, "s")
except:
    pass

# %%
# We have an asymptotic estimate of the value for this estimator.
pvas = globHSIC.getPValuesAsymptotic()
print("p-value (asymptotic): ", pvas)
t1 = time()
print("t (total)=", t1 - t00, "s")

Many other optimizations could be done, such as:

  • Provide specialized algorithms for scalar matrix weights (HSICEstimatorGlobalSensitivity, HSICEstimatorTargetSensitivity), diagonal matrix weights (HSICEstimatorConditionalSensitivity) and symmetrical matrix weights (are there non symmetrical weights on the Earth?)
  • Reorder some linear algebra
  • Other things I will find using flamegraphs on the new implementation

regislebrun avatar Jan 14 '23 15:01 regislebrun