CaDrA
CaDrA copied to clipboard
Signed Mutual Information
[Note: I'm adding the content of my email here for record keeping]
The reason revealer returns a signed MI is because it multiplies the actual MI by the sign of the features’ correlation.
In the code, you will see that cond_mutual_inf
has the step (line 202):
CIC <- sign(rho) * sqrt(1 - exp(-2 * CMI))`
And and the mutual_inf_v2
function has the step (line 248)
IC <- sign(rho) * sqrt(1 - exp(-2 * MI))`
Which basically multiplies the MI by the sign of the correlation (rho) between the two variables.
I think we can do the same in our knnmi-based score. In order not to lose efficiency, we could call the cor function on the entire set of features. i.e., when computing the MI between X and all the remaining features, say, REST, do something like
MI <- knnmi(X,REST,Z)
RHO <- cor(X,REST)
SMI <- MI * sign(RHO)