EMAworkbench icon indicating copy to clipboard operation
EMAworkbench copied to clipboard

analysis.clusterer.calculate_cid bug

Open max-reddel opened this issue 2 years ago • 0 comments

If the parameter data of the calculate_cid() function contains arrays that have a static value (e.g., [0. 0. 0. 0. 0.]), it would be imply that ce_i=0 or ce_j=0 which then breaks the computation in the helper function CID() when it attempts to divide by zero.

Ideally, a special case needs to be introduced in which CID() does not divide by zero. I'm not sure whether this does not cause other issues but it might be possible to solve the issue by flooring the denominator of the equation with a low value (>0) if the denominator would be zero otherwise.

Thus, changing the function from this:

def CID(xi, xj, ce_i, ce_j):
    return np.linalg.norm(xi - xj) * (max(ce_i, ce_j) / min(ce_i, ce_j)) 

To this:

def CID(xi, xj, ce_i, ce_j):
    return np.linalg.norm(xi - xj) * (max(ce_i, ce_j) / max(0.001, min(ce_i, ce_j)))

Again, I'm not sure whether this is the best solution.

max-reddel avatar Apr 21 '22 08:04 max-reddel