dodiscover icon indicating copy to clipboard operation
dodiscover copied to clipboard

Allow CMI to be approximated with Chi-square in large-sample sizes?

Open adam2392 opened this issue 2 years ago • 1 comments

          LGTM.  One comment.  The CMI value that is calculated as follows: 

val = hxyz - (hxz + hyz - hz).mean()

Doesn't this have an asymptotic chi-squared distribution under the null hypothesis? If so, should there be an option to calculate the p-value that way?

Originally posted by @robertness in https://github.com/py-why/dodiscover/pull/85#pullrequestreview-1252168893

adam2392 avatar Jan 20 '23 03:01 adam2392

LGTM. One comment. The CMI value that is calculated as follows:

val = hxyz - (hxz + hyz - hz).mean()

Doesn't this have an asymptotic chi-squared distribution under the null hypothesis? If so, should there be an option to calculate the p-value that way?

@robertness I'm not sure. I'm computing it as val = I(X;Y, Z) - I(X; Z), which is equivalent to the entropy definition you wrote I suppose. Is there a reference? I can add it in a follow-on PR.

adam2392 avatar Jan 20 '23 03:01 adam2392