analogue icon indicating copy to clipboard operation
analogue copied to clipboard

bug fix: information distance is NaN if there are any zeros in the data

Open jarioksa opened this issue 1 year ago • 0 comments

If x or y is zero, the expression for information distance simplifies to 0 * log(0) which is NaN, and if there is any NaN, the sum is NaN. Therefore information distance is NaN always when there is any zero for any species in compared sites. In most cases, it always returns a matrix of NaNs only.

log(0) is -Inf, but 0 * (-Inf) is NaN. So we should skip these zero entries completely. Please note that $\lim_{x \to 0^+} x \log(x) = 0$. So it is right to skip zeros since they add nothing to the sum.

C has currently function log2 for base-2 logs, but I am not sure if this is completely portable. I didn't use this, but instead I replaced evaluation of log(2) with constant M_LN2 defined in Rmath.h.

With this fix, distance and oldDistance return numerically equal results within magnitude 10-15.

This PR replaces earlier PR #27.

jarioksa avatar May 17 '23 07:05 jarioksa