Distances.jl icon indicating copy to clipboard operation
Distances.jl copied to clipboard

Add Earth Mover's Distance/Wasserstein metric

Open tlnagy opened this issue 9 years ago • 6 comments

EMD is useful for calculating distances between histograms or distributions. It would be cool to see it added to Distances.jl

https://en.wikipedia.org/wiki/Earth_mover's_distance

tlnagy avatar Feb 02 '16 05:02 tlnagy

Would also be neat to see the Sinkhorn distance, or entropy-regularized Wasserstein norm.

currymj avatar May 30 '17 16:05 currymj

We always accept PRs, so if anyone is interested in adding those metrics to this package, it would certainly be welcome.

ararslan avatar May 30 '17 17:05 ararslan

In the context of this package, is there a sensible way to provide an arbitrary cost function when constructing the norm? Or would that not make sense with the design? Another option would just be to provide specific subtypes for the 1-, 2-, and max-norms. But arbitrary cost functions fit better with the general idea.

Likewise for the entropy-regularized case, the type would have to be parameterized with a float coefficient. Is this also reasonable?

currymj avatar May 30 '17 18:05 currymj

Also, I've just noticed that the norms seem to work well for numbers as well as vectors.

Would it be reasonable to define something like WassersteinNorm{N<:Metric}, which works on a vector with elements of some type T for which Metric is meaningful? This might be even cleaner.

This is a very clean and minimal library so I don't want to overcomplicate things.

currymj avatar May 30 '17 18:05 currymj

I made a small wrapper of the original EMD implementation by Rubner, the inventor of the Earth Mover Distance:

https://github.com/mirkobunse/EarthMoversDistance.jl

The wrapper only supports one-dimensional histograms for now, but it is easily extensible. The original wrapped algorithm can compute the EMD of any two distributions. Already, the wrapper allows you to define arbitrary ground distances. I'd appreciate your support in extending the project! Comments are welcome, as well.

Ultimately, I hope that the wrapper helps in implementing the EMD natively in Julia, e.g., in the Distances package. Specifically, efficient special-case algorithms would be great. My wrapper can be used to test such implementations.

mirkobunse avatar Feb 08 '18 09:02 mirkobunse

I wrote a really basic pure julia EMD script here: https://discourse.julialang.org/t/computing-discrete-wasserstein-emd-distance-in-julia/9600/10?u=ckneale

Not sure if this is still of interest but I did see this is an open issue.

caseykneale avatar Mar 14 '20 09:03 caseykneale