dimod icon indicating copy to clipboard operation
dimod copied to clipboard

Be able to construct quadratic models from `np.memmap` array efficiently

Open juansebastianl opened this issue 2 years ago • 2 comments

Application If I have a square matrix that is very large and being stored as an np.memmap array and I try to construct a BQM with it I often run out of memory even if the actual final BQM isn't too large for my system.

Proposed Solution If I simply iterate through my memory mapped matrix I don't run out of memory for matricies which are large enough that I normally would (tested up to 25,000x25,0000 on 15GB of RAM):

bqm = dimod.BQM(vartype="BINARY")
bqm.add_variables_from({i: large_matrix[i,i] for i in range(large_matrix.shape[0])}) 
chunksize = 100
for row_index in range(0, large_matrix.shape[0], chunksize):
      for col_index in range(0, large_matrix.shape[1], chunksize):
              row_index_max = min(row_index + chunksize, large_matrix.shape[0])
              col_index_max = min(col_index + chunksize, large_matrix.shape[1])
              chunk = large_matrix[row_index:row_index_max, col_index:col_index_max]
              quad_biases = [(i[0], i[1], j) for i, j in np.ndenumerate(chunk) if i[0] > i[1]]
              bqm.add_interactions_from(quad_biases)

This sort of points towards a solution, but I'm not sure given how the BQM constructor is written that there is an easy way of doing this - at the moment it is cythonized and I think for that to continue to work there is likely an implicit loading of the whole memory mapped matrix into RAM which is an issue for this approach.

Alternatives Considered

There is the possibility of having models that are memory serialized themselves, so that you don't have to ever iterate through the matrix if you pass the underlying file, but that seems even more complex.

juansebastianl avatar Feb 24 '23 03:02 juansebastianl

Interesting. What you're currently doing is pretty reasonable I think. Though you might have a bit more luck with using np.nditer to control things like buffer size etc.

You're correct though that we could get some performance benefit from handling this natively. When we add quadratic models from dense numpy arrays in memory, in some cases we can take advantage of the order of indices to more efficiently construct the model. See: https://github.com/dwavesystems/dimod/blob/aadde0456e9bff939b886994aec6de3e227c92fe/dimod/binary/cybqm/cybqm_template.pyx.pxi#L226-L242 https://github.com/dwavesystems/dimod/blob/aadde0456e9bff939b886994aec6de3e227c92fe/dimod/include/dimod/abc.h#L574-L586

However, actually getting memmap arrays down into the C++ would be quite a pain. Right now, you're correct that we are using some implicit conversions in NumPy and Cython that are pulling it into memory.

arcondello avatar Feb 27 '23 16:02 arcondello

@juansebastianl Thanks for posting this, I was facing the same issue and struggling to think of a workaround. Your code seems to have a bug though where the constructed BQM does not match the original QUBO, and actually chunking the matrix is not really any faster than just iterating through the entire thing (at least in my tests). Here's a slightly simpler alternative:

def bqm_from_huge_qubo(q):
    """ Create a BinaryQuadraticModel from a QUBO matrix iteratively.
    This method is intended to be used for very large QUBO matrices which
    will cause a kernel crash with `dimod.BinaryQuadraticModel.from_qubo(q)`

    It is very slow (e.g. ~30 mins for 45k vars).
    """
    bqm = dimod.BinaryQuadraticModel(vartype='BINARY')
    bqm.add_variables_from({i: q[i,i] for i in range(q.shape[0])})
    for x in range(0, q.shape[0]):
        if x % 1000 == 0:
            print('*', end='')
        for y in range(0, q.shape[0]):
            if x != y and q[x, y] != 0:
                bqm.add_interactions_from([(x, y, q[x, y])])
    return bqm

tim-shea avatar Oct 12 '23 19:10 tim-shea