pyemd icon indicating copy to clipboard operation
pyemd copied to clipboard

EMD not leading to any results, without any error raised

Open ChristelDG opened this issue 3 years ago • 1 comments

Hello,

My question may be a very easy one but I am loosing my nerves trying to solve it. here are my parameters : first_histogram = [1. 1. 1.] second_histogram = [2. 0. 0.] distance_matrix = array([[0. , 0. , 0. , 0.60058105, 1. ], [0. , 0. , 0. , 0.60058105, 1. ], [0. , 0. , 0. , 0.60058105, 1. ], [0.60058105, 0.60058105, 0.60058105, 0. , 0.98793931], [1. , 1. , 1. , 0.98793931, 0. ]])

(My distance matrix is the result of sklearn.metrics.pairwise.cosine_distances(), so it truly is a distance matrix) Now if I try to do : D_EMD = emd(first_histogram, second_histogram, distance_matrix)

The code runs for ever without getting any results, without any Error Raised...

Does anyone have any idea what I'm doing wrong?

Thanks a lot !

Christel

ChristelDG avatar Jan 25 '22 17:01 ChristelDG

Thanks—I reproduced this on both macOS and Debian, and I found that it depends on having two zeros in the second histogram; the value of the 2.0 doesn't matter; and changing the extra mass penalty doesn't seem to help.

This is almost certainly a problem with how the underlying algorithm in C++ handles some edge cases; unfortunately I don't have the bandwidth right now to look into it further. Likely related to #54. If you discover the issue then a PR would be most welcome.

wmayner avatar Jan 26 '22 16:01 wmayner