NPEET icon indicating copy to clipboard operation
NPEET copied to clipboard

Question on how to compute normalized mutual information for discrete and continuous data

Open ivan-marroquin opened this issue 3 years ago • 2 comments
trafficstars

Hi Greg,

Many thanks for making available such great Python code!

I was wondering if you could provided suggestions on how to compute normalized mutual information for discrete and continuous data. I would expect the normalized version of mutual information to be in the range [0, 1].

Kind regards, Ivan

ivan-marroquin avatar Dec 10 '21 19:12 ivan-marroquin

That makes sense. If X is continuous and Z is discrete, then I(X;Z) = H(Z) - H(Z|X) <= H(Z), where H is Shannon entropy and is always non-negative. So using I(X;Z) / H(Z) is probably your best best for a normalized quantity.

For estimation, you will have to use the "micd" estimator (mutual information between continuous and discrete) which does this though, I(X;Z) = h(X) - h(X|Z), where h is differential entropy estimated using NPEET. My one worry is that because the error for the two terms could be different, you may sometimes get quantities outside your desired range. Depending on your scenario, you might deal with it in different ways. You could clip the values, or try to do bootstrap sampling to get a range of possible values.

gregversteeg avatar Dec 11 '21 18:12 gregversteeg

Hi Greg,

Thanks for the prompt answer and explanation. So, I assume that if I only have continuous data the normalized mutual information can be computed using I(X;Z) / H(Z) . So, in this case I use your continuous estimator for mutual information. On the other hand, if I only have discrete data. Then I compute the mutual information using the discrete version of the estimator. Am I correct?

Ivan

ivan-marroquin avatar Dec 13 '21 19:12 ivan-marroquin