nengo-spa icon indicating copy to clipboard operation
nengo-spa copied to clipboard

Batch binding

Open jgosmann opened this issue 3 years ago • 4 comments

By @arvoelke in this comment:

This is quite a bit outside the scope of this PR, but while I'm thinking about it I have been wondering how this interface could work efficiently with batches of vectors and multiple consecutive bindings (being done offline without neural networks). I have been doing this quite a bit lately to prepare datasets comprised of [S]SPs and found that it's much faster (e.g., 3-5x faster) to apply just a single rfft to an entire batch of vectors, do all of the bindings that are needed while staying in the Fourier domain, and then apply a single irfft to the batch of vectors only as a final step. I don't really see a way to reuse this code in this context, but it would be nice to have that sort of capability centralized here without replicating parts of the HRR algebra code, so it's maybe something to think about.

jgosmann avatar Mar 16 '21 18:03 jgosmann

Would you envision something were multiple SPs can be bound at once like so:

base_sp.bind(first_sp, second_sp, third_sp, ...)

Or rather something were you can convert a vector to the Fourier space, perform the operations there and then convert back?

in_fourier = base_sp.to_fourier()
in_fourier.bind(first_in_fourier)
in_fourier.bind(second_in_fourier)
in_fourier.bind(third_in_fourier)
bound_sp = in_fourier.to_sp()

I suppose the first variant would be easier to implement, whereas the second variant gives more flexibility.

The examples are of course rough sketches how the API could look like. Actual names would likely be different.

jgosmann avatar Mar 16 '21 18:03 jgosmann

I was also envisioning something like you might have the following batches/tensors of SPs:

A : (n, d)
B : (n, d)
C : (1, d)

where d is the SP dimensionality, and n is the batch size.

And then want to do stuff like A * (B ** 2.5) * C with the usual NumPy semantics (e.g., C is broadcast across the batch axis).

With the HRR algebra, that can be done by staying in the Fourier domain the entire time, and using NumPy operations that are vectorized across the batch (e.g., np.fft.rfft(..., axis=-1)). Similarly batches of "positive" unitary vectors can be generated efficiently by using Euler's formula np.exp(1j * phi) where |phi| < pi and setting the DC/Nyquist coefficients to 1 across the batch axis (no FFT required). It's not a lot of code to write out manually on a case-by-case basis, but it involves a bit of duplicating what's in NengoSPA.

arvoelke avatar Mar 17 '21 18:03 arvoelke

Something like this already works, you can put SemanticPointer instances into a NumPy array and use these:

A = np.array([spa.SemanticPointer(...), spa.SemanticPointer(...), ...])
B = np.array([spa.SemanticPointer(...), spa.SemanticPointer(...), ...])
C = spa.SemanticPointer(...)  # or alternatively np.array([spa.SemanticPointer(...)])

result = A * (B ** 2.5) * C  # note the ** operator is not yet implemented, but should work this way once it is

But I suppose you're also thinking of an optimization to avoid doing the IFFT followed with the FFT again in-between operations? For that I could think of two approaches:

  • Trying to handle this transparently by adding the capability to store each SemanticPointer in either the standard vector space or FFT space and only do a conversion to the respective other representation only when required by an operation.
  • Add an API to do operation explicitly in the Fourier space. Different options might be viable, just spitballing:
    • with spa.useFourierSpace:
          result = A * (B ** 2.5) * C
      
    • result = spa.inFourierSpace("A * (B ** 2.5) * C") (hm, code in strings, not sure I like that 🤔)
    • result = (A.fft() * (B.fft() ** 2.5) * C.fft()).ifft()

Similarly batches of "positive" unitary vectors can be generated efficiently by using Euler's formula np.exp(1j * phi) where |phi| < pi and setting the DC/Nyquist coefficients to 1 across the batch axis (no FFT required).

This could be implemented as additional generator in the vector_generation module, I suppose.

jgosmann avatar Apr 10 '21 19:04 jgosmann

When I was mentioning the speed-up that is actually what I was comparing to. I used to do this in the above style. And it does indeed work functionally, but it's quite slow. It's much faster to batch it in NumPy because it does everything in a constant number of C routines that each do the work in parallel, as opposed to looping and sequentially invoking a Python method that then dispatches separate C routines for each SP. Here's a quick-and-dirty benchmarking example:

import numpy as np
import nengo_spa as spa

rng = np.random.RandomState(seed=0)

n, d = 1024, 256
A = rng.randn(n, d)
B = rng.randn(n, d)

sspA = np.asarray([spa.SemanticPointer(a) for a in A])
sspB = np.asarray([spa.SemanticPointer(b) for b in B])

%timeit sspC = sspA * sspB
%timeit C = np.fft.irfft(np.fft.rfft(A) * np.fft.rfft(B))

assert np.allclose(C, [c.v for c in sspC])
22.4 ms ± 369 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.21 ms ± 999 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

So in this case it's ~10x faster to batch it in NumPy. I'm using a Conda install of numpy==1.18.5 and an AMD Ryzen 9 3900X 12-core processor.

arvoelke avatar Apr 11 '21 01:04 arvoelke