POT Numerical issue of ot.emd with large entries

Numerical issue of ot.emd with large entries

Open tlacombe opened this issue 3 years ago • 11 comments

Describe the bug

It seems ot.emd fails to return an optimal plan (up to numerical precision) if there is large entries in the cost matrix (even if the optimal weight to put on these entries is 0).

To Reproduce

import numpy as np
import ot

M = np.array(
    [
        [2.50275352e02, 3.74653218e02, 2.41352736e03, 1.00000000e32, 1.51751540e-03],
        [2.13082030e02, 3.28812836e02, 2.29487946e03, 1.00000000e32, 1.37109800e-01],
        [1.97333083e02, 3.09175848e02, 2.24250550e03, 1.00000000e32, 2.46506283e00],
        [1.00000000e32, 1.00000000e32, 1.00000000e32, 5.26223432e00, 2.50000000e31],
        [3.84690152e01, 8.09465684e01, 3.33064175e02, 2.50000000e31, 0.00000000e00],
    ]
)

a = np.array([0.125, 0.125, 0.125, 0.125, 0.5])
b = np.array([0.125, 0.125, 0.125, 0.125, 0.5])
P = ot.emd(a=a, b=b, M=M, numItermax=2000000)
Q = np.array(
    [
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0.125, 0],
        [0.125, 0.125, 0.125, 0, 0.125],
    ]
)
assert (P.sum(axis=0) == a).all()
assert (P.sum(axis=1) == a).all()
assert (Q.sum(axis=0) == a).all()
assert (Q.sum(axis=1) == a).all()
print("my cost matrix:\n", Q)
print("POT matrix:\n", P)
print("POT cost:", np.sum(np.multiply(P, M)))
print("my cost:", np.sum(np.multiply(Q, M)))

returns:

my cost matrix:
 [[0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.125 0.   ]
 [0.125 0.125 0.125 0.    0.125]]
POT matrix:
 [[0.    0.125 0.    0.    0.   ]
 [0.125 0.    0.    0.    0.   ]
 [0.    0.    0.125 0.    0.   ]
 [0.    0.    0.    0.125 0.   ]
 [0.    0.    0.    0.    0.5  ]]
POT cost: 354.43787279000003
my cost: 57.54321038317501

Expected behavior

ot.emd should return (up to numerical precision) a transport plan (at least) as good as the Q I manually propose.

Environment (please complete the following information):

OS (e.g. MacOS, Windows, Linux): Ubuntu 20.04
Python version: 3.7
How was POT installed (source, pip, conda): conda

Output of the following code snippet:

>>> import platform; print(platform.platform())
Linux-5.4.0-70-generic-x86_64-with-debian-bullseye-sid
>>> import sys; print("Python", sys.version)
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.16.4
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.3.1
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

Additional context

As shown, I set numIterMax at 2000000 and didn't get any warning (and the code run fast) so the algorithm does converge.

Apr 01 '21 14:04 tlacombe

Hi theo, thanks for reporting this bug. I suspect there is an overflow somewhere in the solver, it will be hard to identify. Here is a quick fixup, if it suits your need. Normalizing M by dividing it by its maximum value does not change the solution and it behaves more consistently: P = ot.emd(a=a, b=b, M=M/M.max(), numItermax=2000000) will produce the desired output.

Apr 01 '21 21:04 ncourty

Should we normalize like that the M matrix inside emd then?

This is weird because it seemed to me that when using np.inf in the M matrix the exact solver works very well.

Apr 02 '21 09:04 rflamary

Note that compiling network_simplex_simple.h with a positive DEBUG_LVL does not work (niter is probably supposed to be iter_number).

Apr 05 '21 19:04 mglisse

@ncourty I may be missing something but I tried the same code using normalization in the ot.emd but this didn't solve the issue. I still get the same (non-optimal) transport plan as the one obtained without normalization. I tried replacing 1e32 values by np.inf as suggested by @rflamary , but this does not work in my example at least : the returned transport plan does not satisfy the marginal constraints.

I also tried using ot.emd2 directly, the output seems consistent with doing np.sum(np.multiply(...)) : using M or M/M.max() yields the same result, using M_inf yields NaN.

Code to reproduce (updated version of the previous code):

import numpy as np
import ot

M = np.array(
    [
        [2.50275352e02, 3.74653218e02, 2.41352736e03, 1.00000000e32, 1.51751540e-03],
        [2.13082030e02, 3.28812836e02, 2.29487946e03, 1.00000000e32, 1.37109800e-01],
        [1.97333083e02, 3.09175848e02, 2.24250550e03, 1.00000000e32, 2.46506283e00],
        [1.00000000e32, 1.00000000e32, 1.00000000e32, 5.26223432e00, 2.50000000e31],
        [3.84690152e01, 8.09465684e01, 3.33064175e02, 2.50000000e31, 0.00000000e00],
    ]
)

M_inf = np.array(
    [
        [2.50275352e02, 3.74653218e02, 2.41352736e03, np.inf, 1.51751540e-03],
        [2.13082030e02, 3.28812836e02, 2.29487946e03, np.inf, 1.37109800e-01],
        [1.97333083e02, 3.09175848e02, 2.24250550e03, np.inf, 2.46506283e00],
        [np.inf, np.inf, np.inf, 5.26223432e00, np.inf],
        [3.84690152e01, 8.09465684e01, 3.33064175e02, np.inf, 0.00000000e00],
    ]
)


a = np.array([0.125, 0.125, 0.125, 0.125, 0.5])
b = np.array([0.125, 0.125, 0.125, 0.125, 0.5])

### POT transport plans
P = ot.emd(a=a, b=b, M=M, numItermax=2000000)
P2 = ot.emd(a=a, b=b, M=M/M.max(), numItermax=2000000)
P_inf = ot.emd(a=a,b=b, M=M_inf, numItermax=2000000)

### POT cost computed directly
potc = ot.emd2(a=a, b=b, M=M)
print('cost by pot.emd2 directly:', potc)
potc2 = ot.emd2(a=a,b=b, M=M/M.max()) * M.max()
print("cost by pot.emd2 using M/M.max():", potc2)
potc_inf = ot.emd2(a=a,b=b, M=M_inf)
print("cost by pot.emd2 using M_inf:", potc_inf)

### My transport plan
Q = np.array(
    [
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0.125, 0],
        [0.125, 0.125, 0.125, 0, 0.125],
    ]
)
assert (P.sum(axis=0) == a).all()
assert (P.sum(axis=1) == b).all()
assert (Q.sum(axis=0) == a).all()
assert (Q.sum(axis=1) == b).all()
### P_inf does not satisfy the marginal constraints
#assert(P_inf.sum(axis=0) == a).all()
#assert(P_inf.sum(axis=1) == b).all()

print("*****")
print("my cost matrix:\n", Q)
print("POT transport plan:\n", P)
print("POT transport plan with M_inf:", P_inf)
print("*****")
print("POT cost:", np.sum(np.multiply(P, M)))
print("POT cost after normalization", np.sum(np.multiply(P2, M)))
print("my cost:", np.sum(np.multiply(Q, M)))

Output:

cost by pot.emd2 directly: 354.43787279000003
cost by pot.emd2 using M/M.max(): 354.43787279000003
cost by pot.emd2 using M_inf: nan
*****
my cost matrix:
 [[0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.125 0.   ]
 [0.125 0.125 0.125 0.    0.125]]
POT transport plan:
 [[0.    0.125 0.    0.    0.   ]
 [0.125 0.    0.    0.    0.   ]
 [0.    0.    0.125 0.    0.   ]
 [0.    0.    0.    0.125 0.   ]
 [0.    0.    0.    0.    0.5  ]]
POT transport plan with M_inf: [[0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.125 0.   ]
 [0.    0.    0.    0.    0.5  ]]
*****
POT cost: 354.43787279000003
POT cost after normalization 354.43787279000003
my cost: 57.54321038317501

Apr 19 '21 08:04 tlacombe

interesting bug, i will try to reproduce it on my machine and get back to you.

Apr 21 '21 11:04 rflamary

A bit simplified

import numpy as np
import ot

z = 1e16
M = np.array([[1, z, 0], [1, z, 0], [z, 0, z], [0, z, 0]])

b = np.array([1, 1, 3])
a = np.array([1, 1, 1, 2])
P = ot.emd(a=a, b=b, M=M)
Q = np.array([[0, 0, 1], [0, 0, 1], [0, 1, 0], [1, 0, 1]])
assert (P.sum(axis=0) == b).all()
assert (P.sum(axis=1) == a).all()
assert (Q.sum(axis=0) == b).all()
assert (Q.sum(axis=1) == a).all()
print("my cost matrix:\n", Q)
print("POT matrix:\n", P)
print("POT cost:", np.sum(np.multiply(P, M)))
print("my cost:", np.sum(np.multiply(Q, M)))

my cost matrix: [[0 0 1] [0 0 1] [0 1 0] [1 0 1]] POT matrix: [[1. 0. 0.] [0. 0. 1.] [0. 1. 0.] [0. 0. 2.]] POT cost: 1.0 my cost: 0.0

Could be related to EPSILON or the precision of double indeed, since 1e16 is roughly the limit where it starts failing.

Apr 21 '21 17:04 mglisse

yes that's it good catch! basically we cannot solve the problem precisely if there exist two values different values in m such that m_1+_2=m_1 due to numerical precision.

I dont' think we can say it is a bug or something we can solve if it is due to the numerical precision for float64.

Maybe we can add a warning when the dynamic in M is too large. Note that if you want to forbid a link you just need to set its value to M.max()*1.0001 or something like that instead of a very large value, it will not change the solution or the value.

Again thank you for reporting this weird behavior.

Apr 22 '21 07:04 rflamary

yes that's it good catch! basically we cannot solve the problem precisely if there exist two values different values in m such that m_1+_2=m_1 due to numerical precision.

I dont' think we can say it is a bug or something we can solve if it is due to the numerical precision for float64.

Well, it isn't impossible. If m_1 is not used in the optimal plan, it is possible to compute the optimal plan without relying on something like m_1+m_2-m_1==m_2. As you mentioned in a previous comment, even a cost of +inf could work (the fact that it can currently output something that isn't even a transport plan is a bit alarming). Depending on the algorithm, arranging the computations to be robust like that can be trivial or very complicated and costly, and the question is whether it is worth the trouble. Maybe it isn't here...

Maybe we can add a warning when the dynamic in M is too large. Note that if you want to forbid a link you just need to set its value to M.max()*1.0001 or something like that instead of a very large value, it will not change the solution or the value.

Just around the max of the other values may not be sufficient to forbid a link, it could still be used to transport a small mass in the optimal plan. We will handle truly infinite values separately in Gudhi (it is currently officially unsupported), so hopefully our users won't be tempted to simulate infinity with large values...

Apr 22 '21 19:04 mglisse

POT exact OT solver is a wrapper around the C++ LEMON solver and we modifying it is out of what we can do. In addition, it might require some kind of testings at each iteration that could greatly decrease the performance.

About the max value, I think it is sufficient if the dynamic in M is limited (below 1e14 for instance) adding a cost that is bigger than all the other would prevent any mass on the link because the solver is a Network simplex and the solution will always be on a face of the polytop and under the dynamic condition this face can be improved if there is mass on the largest values of M.

Apr 23 '21 08:04 rflamary

POT exact OT solver is a wrapper around the C++ LEMON solver and we modifying it is out of what we can do.

ok, thanks.

In addition, it might require some kind of testings at each iteration that could greatly decrease the performance.

Yes, although since I am not familiar with the algorithm I have no idea how large the cost would be.

About the max value, I think it is sufficient if the dynamic in M is limited (below 1e14 for instance) adding a cost that is bigger than all the other would prevent any mass on the link because the solver is a Network simplex and the solution will always be on a face of the polytop and under the dynamic condition this face can be improved if there is mass on the largest values of M.

emd(a=[],b=[],M=[[1,10,10],[10,1,10],[10,10,18]]) outputs a diagonal matrix, which in particular puts some mass on the 18, I need something strictly larger than 19 so it outputs a different plan. So "bigger than all the other" isn't quite sufficient, more like bigger than a path that can replace this link, or to be safe maybe bigger than the sum of the other costs.

Apr 23 '21 09:04 mglisse

you are right good example

Apr 23 '21 12:04 rflamary

POT POT copied to clipboard

Numerical issue of ot.emd with large entries

Describe the bug

To Reproduce

Expected behavior

Environment (please complete the following information):

Additional context

POT
POT copied to clipboard