lapjv icon indicating copy to clipboard operation
lapjv copied to clipboard

Breaks immediately on very large problems

Open kieranbrowne opened this issue 6 years ago • 3 comments

Hey great library! I'm curious about why this function crashes on large assignment problems. It worked for me for 5000 and 10,000 position problems but crashed for 55,000 immediately. At first I thought it was a memory issue but it seems to be crashing long before exhausting all the memory on the machines I tried. Do you have any idea why?

kieranbrowne avatar Nov 06 '18 02:11 kieranbrowne

Hi @kieranbrowne

I am afraid that even if the lib did not crash, you would have to wait literally ages until you got the result. The time complexity is cubic, even after applying all the algorithmic smart tricks and computational optimizations. 55k^3 is a big number :)

Now that you are hopeless, may I ask you two things:

  1. The exact error details: stack trace, error message, etc.
  2. Sample data and code to reproduce the crash.

I can debug and tell the exact reason why it did not work.

vmarkovtsev avatar Nov 06 '18 07:11 vmarkovtsev

Hey, darn that's okay. Thanks anyway.

Here's the example.

# imaginary data
data = np.empty([55000,2])
from lapjv import lapjv
import numpy as np

def dists(a, b):
    return np.linalg.norm(a-b,axis=1)

def mycdist(grid,positions):
    """Because scipy.spatial.distance.cdist was giving memory errors; sadly this version is slower"""
    out = np.array([]).astype(np.float16).reshape([0,len(positions)])
    tmp = []
    for i in range(len(positions)):
        tmp.append(np.square(dists(grid,positions[i])).astype(np.float16))
        if i%1000 == 0:
            print(i//1000, "/", len(positions)//1000)
            out = np.concatenate((out,np.nan_to_num(np.array(tmp).astype(np.float16))),axis=0)
            del tmp
            tmp = []
    out = np.concatenate((out,np.nan_to_num(np.array(tmp).astype(np.float16))),axis=0)
    del tmp
        #out[i] = np.square(dists(grid,positions[0])).astype(np.float16)
    return out

n_x, n_y = (200, 275) # two numbers that multiply together to give 55000

grid = np.dstack(np.meshgrid(np.linspace(0, 1,n_y), np.linspace(0, 1, n_x))).reshape(-1, 2)
cost_matrix = mycdist(grid, data)
cost_matrix = cost_matrix * (1000 / cost_matrix.max())
row_asses, col_asses, _ = lapjv(cost_matrix)

Unfortunately there's no error message through jupyter, just a popup that says the kernel died. I've been watching top while it runs and it doesn't get anywhere near exhausting the system memory though.

kieranbrowne avatar Nov 09 '18 00:11 kieranbrowne

Hold on, I think there was a separate problem where lapjv wouldn't accept float16s but even using the standard scipy cdist function I think there's an error.

from lapjv import lapjv
import numpy as np
from scipy.spatial.distance import cdist

data = np.empty([55000,2])
n_x, n_y = (200, 275) # two numbers that multiply together to give 55000

grid = np.dstack(np.meshgrid(np.linspace(0, 1,n_y), np.linspace(0, 1, n_x))).reshape(-1, 2)
cost_matrix = cdist(grid[0:40000], data[0:40000])
cost_matrix = cost_matrix * (1000 / cost_matrix.max())
row_asses, col_asses, _ = lapjv(cost_matrix)

I'll need to boot the digital ocean box back up to get a stack trace.

kieranbrowne avatar Nov 09 '18 00:11 kieranbrowne