dodiscover
dodiscover copied to clipboard
Allow CMI to be approximated with Chi-square in large-sample sizes?
LGTM. One comment. The CMI value that is calculated as follows:
val = hxyz - (hxz + hyz - hz).mean()
Doesn't this have an asymptotic chi-squared distribution under the null hypothesis? If so, should there be an option to calculate the p-value that way?
Originally posted by @robertness in https://github.com/py-why/dodiscover/pull/85#pullrequestreview-1252168893
LGTM. One comment. The CMI value that is calculated as follows:
val = hxyz - (hxz + hyz - hz).mean()
Doesn't this have an asymptotic chi-squared distribution under the null hypothesis? If so, should there be an option to calculate the p-value that way?
@robertness I'm not sure. I'm computing it as val = I(X;Y, Z) - I(X; Z)
, which is equivalent to the entropy definition you wrote I suppose. Is there a reference? I can add it in a follow-on PR.