Parse large instances with 15K or 30K nodes
The large instances of Arnold, Gendreau and Sörensen (2017) in CVRPLIB currently hang on the edge weight calculations. These instances are very large, ranging between 6-30K nodes. Let's see if we can support such large instances as well.
I performed a a quick benchmark with the current distance computation + four alternatives:
Script
import numpy as np
import timeit
from tabulate import tabulate
import numba
def current(coords: np.ndarray) -> np.ndarray:
diff = coords[:, np.newaxis, :] - coords
square_diff = diff**2
square_dist = np.sum(square_diff, axis=-1)
return np.sqrt(square_dist)
def faster_current(pts: np.ndarray) -> np.ndarray:
sq_sum = np.sum(pts**2, axis=1)
sq_diff = np.add.outer(sq_sum, sq_sum) - 2 * np.dot(pts, pts.T)
np.fill_diagonal(sq_diff, 0) # avoids minor numerical issues
return np.sqrt(sq_diff)
def numpy_linalg(X: np.ndarray) -> np.ndarray:
return np.linalg.norm(X[:, np.newaxis] - X, axis=2)
@numba.jit(nopython=True, fastmath=True)
def variant_numba(a: np.ndarray) -> np.ndarray:
n = a.shape[0]
out = np.zeros((n, n))
for i in range(n):
for j in range(i):
out[i][j] = euclidean(a[i], a[j])
return out + out.T
@numba.jit(nopython=True, fastmath=True)
def euclidean(u, v):
n = len(u)
dist = 0
for i in range(n):
dist += abs(u[i] - v[i]) ** 2
return dist ** (1 / 2)
def variant_scipy(pts: np.ndarray) -> np.ndarray:
from scipy.spatial import distance_matrix
return distance_matrix(pts, pts)
if __name__ == "__main__":
np.random.seed(1)
headers = ["# of coords", "Function", "Time (s)"]
NUMBER = 3
for num_coords in [5000, 10000, 15000]:
coords = np.random.randn(num_coords, 2)
funcs = [
current,
faster_current,
numpy_linalg,
variant_numba,
variant_scipy,
]
results = []
for func in funcs:
elapsed = timeit.timeit(
f"{func.__name__}(coords)", globals=globals(), number=NUMBER
)
avg = round(elapsed / NUMBER, 2)
results.append((num_coords, func.__name__, avg))
print(tabulate(results, headers=headers))
# current and numpy_linalg are too slow
for num_coords in [20000, 25000, 30000]:
coords = np.random.randn(num_coords, 2)
funcs = [
faster_current,
variant_numba,
variant_scipy,
]
results = []
for func in funcs:
elapsed = timeit.timeit(
f"{func.__name__}(coords)", globals=globals(), number=NUMBER
)
avg = round(elapsed / NUMBER, 2)
results.append((num_coords, func.__name__, avg))
print(tabulate(results, headers=headers))
Results
Results:
# of coords Function Time (s)
------------- -------------- ----------
5000 current 0.4
5000 faster_current 0.14
5000 numpy_linalg 0.41
5000 variant_numba 0.21
5000 variant_scipy 0.33
# of coords Function Time (s)
------------- -------------- ----------
10000 current 1.58
10000 faster_current 0.44
10000 numpy_linalg 1.58
10000 variant_numba 0.36
10000 variant_scipy 1.19
# of coords Function Time (s)
------------- -------------- ----------
15000 current 4.29
15000 faster_current 0.89
15000 numpy_linalg 4.94
15000 variant_numba 0.76
15000 variant_scipy 2.62
# of coords Function Time (s)
------------- -------------- ----------
20000 faster_current 1.54
20000 variant_numba 1.39
20000 variant_scipy 4.61
# of coords Function Time (s)
------------- -------------- ----------
25000 faster_current 2.37
25000 variant_numba 2.16
25000 variant_scipy 7.2
# of coords Function Time (s)
------------- -------------- ----------
30000 faster_current 66.12
30000 variant_numba 6.8
30000 variant_scipy 10.86
For instances up to 25K, the faster_current function is pretty fast. At 30K it becomes much slower than the numba or scipy - could this be due to memory issues? I'm not sure. Anyway, I think most users don't solve 30K instances anyway and the 60s-ish runtime is acceptable.
I also don't want vrplib to rely on heavy dependencies like numba or scipy, so sticking to the numpy version is preferred.
I'll implement the faster_current version soon.
Another option is to write some C++ code.
See this for setting up Poetry with a custom build.
The scipy variant seems to scale very well and is faster than what we have now. Looking at the implementation here, the relevant bits are just a few lines of code. We could implement our calculation similarly?
On my machine (Windows with an x86/i7 CPU from last year), the results look something like this:
# of coords Function Time (s)
------------- -------------- ----------
5000 current 0.62
5000 faster_current 0.28
5000 numpy_linalg 0.63
5000 variant_scipy 0.63
10000 current 2.45
10000 faster_current 1.14
10000 numpy_linalg 2.48
10000 variant_scipy 2.4
15000 current 5.68
15000 faster_current 2.71
15000 numpy_linalg 6.16
15000 variant_scipy 5.94
20000 current 10.3
20000 faster_current 5.71
20000 variant_scipy 12.82
25000 faster_current 10.2
25000 variant_scipy 26.54
30000 faster_current 27.49
30000 variant_scipy 41.49
Here, faster_current is the clear winner, but scipy doesn't seem to do so well. Perhaps we should just implement faster_current and be done with it.
Yeah agreed!