VRPLIB Parse large instances with 15K or 30K nodes

The large instances of Arnold, Gendreau and Sörensen (2017) in CVRPLIB currently hang on the edge weight calculations. These instances are very large, ranging between 6-30K nodes. Let's see if we can support such large instances as well.

Mar 16 '24 20:03 N-Wouda

I performed a a quick benchmark with the current distance computation + four alternatives:

Script

import numpy as np
import timeit
from tabulate import tabulate

import numba


def current(coords: np.ndarray) -> np.ndarray:
    diff = coords[:, np.newaxis, :] - coords
    square_diff = diff**2
    square_dist = np.sum(square_diff, axis=-1)
    return np.sqrt(square_dist)


def faster_current(pts: np.ndarray) -> np.ndarray:
    sq_sum = np.sum(pts**2, axis=1)
    sq_diff = np.add.outer(sq_sum, sq_sum) - 2 * np.dot(pts, pts.T)
    np.fill_diagonal(sq_diff, 0)  # avoids minor numerical issues
    return np.sqrt(sq_diff)


def numpy_linalg(X: np.ndarray) -> np.ndarray:
    return np.linalg.norm(X[:, np.newaxis] - X, axis=2)


@numba.jit(nopython=True, fastmath=True)
def variant_numba(a: np.ndarray) -> np.ndarray:
    n = a.shape[0]
    out = np.zeros((n, n))

    for i in range(n):
        for j in range(i):
            out[i][j] = euclidean(a[i], a[j])

    return out + out.T


@numba.jit(nopython=True, fastmath=True)
def euclidean(u, v):
    n = len(u)
    dist = 0
    for i in range(n):
        dist += abs(u[i] - v[i]) ** 2
    return dist ** (1 / 2)


def variant_scipy(pts: np.ndarray) -> np.ndarray:
    from scipy.spatial import distance_matrix

    return distance_matrix(pts, pts)


if __name__ == "__main__":
    np.random.seed(1)
    headers = ["# of coords", "Function", "Time (s)"]
    NUMBER = 3

    for num_coords in [5000, 10000, 15000]:
        coords = np.random.randn(num_coords, 2)

        funcs = [
            current,
            faster_current,
            numpy_linalg,
            variant_numba,
            variant_scipy,
        ]

        results = []
        for func in funcs:
            elapsed = timeit.timeit(
                f"{func.__name__}(coords)", globals=globals(), number=NUMBER
            )
            avg = round(elapsed / NUMBER, 2)
            results.append((num_coords, func.__name__, avg))

        print(tabulate(results, headers=headers))

    # current and numpy_linalg are too slow
    for num_coords in [20000, 25000, 30000]:
        coords = np.random.randn(num_coords, 2)

        funcs = [
            faster_current,
            variant_numba,
            variant_scipy,
        ]

        results = []
        for func in funcs:
            elapsed = timeit.timeit(
                f"{func.__name__}(coords)", globals=globals(), number=NUMBER
            )
            avg = round(elapsed / NUMBER, 2)
            results.append((num_coords, func.__name__, avg))

        print(tabulate(results, headers=headers))

Results

Results:

  # of coords  Function          Time (s)
-------------  --------------  ----------
         5000  current               0.4
         5000  faster_current        0.14
         5000  numpy_linalg          0.41
         5000  variant_numba         0.21
         5000  variant_scipy         0.33

  # of coords  Function          Time (s)
-------------  --------------  ----------
        10000  current               1.58
        10000  faster_current        0.44
        10000  numpy_linalg          1.58
        10000  variant_numba         0.36
        10000  variant_scipy         1.19

  # of coords  Function          Time (s)
-------------  --------------  ----------
        15000  current               4.29
        15000  faster_current        0.89
        15000  numpy_linalg          4.94
        15000  variant_numba         0.76
        15000  variant_scipy         2.62

  # of coords  Function          Time (s)
-------------  --------------  ----------
        20000  faster_current        1.54
        20000  variant_numba         1.39
        20000  variant_scipy         4.61

  # of coords  Function          Time (s)
-------------  --------------  ----------
        25000  faster_current        2.37
        25000  variant_numba         2.16
        25000  variant_scipy         7.2

  # of coords  Function          Time (s)
-------------  --------------  ----------
        30000  faster_current       66.12
        30000  variant_numba         6.8
        30000  variant_scipy        10.86

For instances up to 25K, the faster_current function is pretty fast. At 30K it becomes much slower than the numba or scipy - could this be due to memory issues? I'm not sure. Anyway, I think most users don't solve 30K instances anyway and the 60s-ish runtime is acceptable. I also don't want vrplib to rely on heavy dependencies like numba or scipy, so sticking to the numpy version is preferred.

I'll implement the faster_current version soon.

Another option is to write some C++ code.

See this for setting up Poetry with a custom build.

Jun 20 '24 17:06 leonlan

The scipy variant seems to scale very well and is faster than what we have now. Looking at the implementation here, the relevant bits are just a few lines of code. We could implement our calculation similarly?

Oct 13 '24 16:10 N-Wouda

On my machine (Windows with an x86/i7 CPU from last year), the results look something like this:

  # of coords  Function          Time (s)
-------------  --------------  ----------
         5000  current               0.62
         5000  faster_current        0.28
         5000  numpy_linalg          0.63
         5000  variant_scipy         0.63

        10000  current               2.45
        10000  faster_current        1.14
        10000  numpy_linalg          2.48
        10000  variant_scipy         2.4

        15000  current               5.68
        15000  faster_current        2.71
        15000  numpy_linalg          6.16
        15000  variant_scipy         5.94

        20000  current               10.3
        20000  faster_current        5.71
        20000  variant_scipy        12.82

        25000  faster_current       10.2
        25000  variant_scipy        26.54

        30000  faster_current       27.49
        30000  variant_scipy        41.49

Here, faster_current is the clear winner, but scipy doesn't seem to do so well. Perhaps we should just implement faster_current and be done with it.

Oct 13 '24 21:10 N-Wouda

Yeah agreed!

Oct 14 '24 04:10 leonlan