CausalDiscoveryToolbox icon indicating copy to clipboard operation
CausalDiscoveryToolbox copied to clipboard

ANM, lack of Gamma in "Gamma HSIC"

Open ArnoVel opened this issue 5 years ago • 4 comments

Hi, This might simply be a conceptual problem, or a lack of knowledge on my part. Usually, using HSIC to compare two ANM candidates can be done by comparing the statistics directly, or by computing the related p-value. However, to compute a p-value one needs to have some notion of the HSIC distribution under the null. The classic paper from Gretton et al. proposes a Gamma Approximation by giving specific plug-in values for the two Gamma parameters in terms of the expectation and variance of the HSIC; If I had to compute the p-value myself, I would use the above approximation for the gamma distribution, and then use the gamma CDF parametrized by the above values.

I am aware there might be other ways to do such a thing, however your snipper in the anm method does not seem to compute p-values, but only test statistics. While this might be wrong, the variable names as well as the description of the method suggests this.

Am I wrong? Right? If either, how so?

Thanks for any additionnal information on this topic, I would ideally like to design a test which detects whenever a model satisfies an ANM with low Type I and II error.

ArnoVel avatar Jan 13 '20 21:01 ArnoVel

For future reference: this test essentially compares the test statistics m*HSIC_b, it is called in this way not because the Gamma approximation is used, but because the gamma approximation is used on the same quantity (m*HSIC_b) in the reference paper.

ArnoVel avatar Jan 22 '20 20:01 ArnoVel

Hi, You are correct: Only the test statistic is computed, and not the p-value. (ref: authors' code here: http://web.math.ku.dk/~peters/code.html). We might want to include the p-value computation, at least for information for users.

Feel free to make a pull request ; it might take some time before I could look into it. Best regards, Diviyan

diviyank avatar Jan 29 '20 08:01 diviyank

Hi, I am a little bit busy atm, however I can point to two possible sources for an easy implementation:

  • a python copy of the original Gretton et al. matlab code, this uses numpy and vectorises on cpu only.
  • my pytorch (gpu compatible) update this however resorts to scipy for the inverse cdf, so while most of the computations can be performed on gpu, there's a limitation there. Also I have a nonstandard way to specify kernels, but that can be changed easily!

ArnoVel avatar Jan 31 '20 09:01 ArnoVel

Hi, Thanks ! I'll look into it Best regards, Diviyan

diviyank avatar Jan 31 '20 13:01 diviyank