EMAworkbench
EMAworkbench copied to clipboard
analysis.clusterer.calculate_cid bug
If the parameter data
of the calculate_cid()
function contains arrays that have a static value (e.g., [0. 0. 0. 0. 0.]
), it would be imply that ce_i=0
or ce_j=0
which then breaks the computation in the helper function CID()
when it attempts to divide by zero.
Ideally, a special case needs to be introduced in which CID()
does not divide by zero. I'm not sure whether this does not cause other issues but it might be possible to solve the issue by flooring the denominator of the equation with a low value (>0) if the denominator would be zero otherwise.
Thus, changing the function from this:
def CID(xi, xj, ce_i, ce_j):
return np.linalg.norm(xi - xj) * (max(ce_i, ce_j) / min(ce_i, ce_j))
To this:
def CID(xi, xj, ce_i, ce_j):
return np.linalg.norm(xi - xj) * (max(ce_i, ce_j) / max(0.001, min(ce_i, ce_j)))
Again, I'm not sure whether this is the best solution.