hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

ZeroDivisionError: float division error when computing per_cluster_scores

Open Dicksonchin93 opened this issue 3 years ago • 3 comments

/python2.7/site-packages/hdbscan/prediction.py", line 484, in membership_vector clusterer.prediction_data_.cluster_tree) . File "hdbscan/_prediction_utils.pyx", line 217, in hdbscan._prediction_utils.outlier_membership_vector . File "hdbscan/_prediction_utils.pyx", line 224, in hdbscan._prediction_utils.outlier_membership_vector . File "hdbscan/_prediction_utils.pyx", line 213, in hdbscan._prediction_utils.per_cluster_scores .ZeroDivisionError: float division

can we add a handler similar to https://github.com/scikit-learn-contrib/hdbscan/commit/fe3d303570d3968aa2641af64c99ce2bd297ae1a ?

Dicksonchin93 avatar Oct 05 '21 06:10 Dicksonchin93

I can add make a PR if that is fine with the HDBSCAN team

Dicksonchin93 avatar Oct 06 '21 03:10 Dicksonchin93

correct me if I am wrong, it seems like the merge height function https://github.com/scikit-learn-contrib/hdbscan/blob/47ef913970b7506789b46d53d37945f97c08fedf/hdbscan/_prediction_utils.pyx#L118-L183 might merge the height in clusters that does not belong to the same nearest neighbour tree parent, which allows the possibility that the max used in https://github.com/scikit-learn-contrib/hdbscan/blob/47ef913970b7506789b46d53d37945f97c08fedf/hdbscan/_prediction_utils.pyx#L203 to be lower in value than the merged height, causing the possibility for the denominator to be negative or zero

I'm proposing a simple fix to default to 1e-8 denominator if it is a zero or less than zero, but I believe there is some sort of logic error happening here

Dicksonchin93 avatar Oct 07 '21 07:10 Dicksonchin93

We are always happy to receive pull requests though I must admit it can be challenging to find the time to review them all.

On Thu, Oct 7, 2021 at 3:20 AM Ee Kin @.***> wrote:

correct me if I am wrong, it seems like the merge height function https://github.com/scikit-learn-contrib/hdbscan/blob/47ef913970b7506789b46d53d37945f97c08fedf/hdbscan/_prediction_utils.pyx#L118-L183 might merge the height in clusters that does not belong to the same nearest neighbour tree parent, which allows the possibility that the max used in https://github.com/scikit-learn-contrib/hdbscan/blob/47ef913970b7506789b46d53d37945f97c08fedf/hdbscan/_prediction_utils.pyx#L203 to be higher values than the merged height

I'm proposing a simple fix to default to 1e-8 denominator if it is a zero or less than zero, but I believe there is some sort of logic error happening here

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/hdbscan/issues/496#issuecomment-937523278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3IUWWYIZBMJSCQO7C5GXDUFVC4JANCNFSM5FK6MQHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jc-healy avatar Oct 07 '21 21:10 jc-healy