effex icon indicating copy to clipboard operation
effex copied to clipboard

Elementwise Kernel optimizations.

Open mnicely opened this issue 3 years ago • 3 comments

Hi @evanmayer, do you have a way to run your code without hardware? I've found a few places to make improvements. If no, I've added ideas below if you ever want to try.

  1. Input to channelizer is read-only, so you might not need to initialize x to zeros
x = cp.empty(len(x)+len(x)%n_branches, dtype=np.complex128) + x  // Line 39
  1. Move to elementwise kernel to reduce data transfer and launch overhead
xcorr_array = psd_0 * (cp.conj(psd_1) * cp.exp(2j * (cp.pi) * freqs * (total_lag / rate) ))  // Line 106
  1. Move to elementwise kernel to reduce data transfer and launch overhead
xcorr *= cp.exp(2j * cp.pi * freqs * (integer_lag / rate) )  // Line 185
# Prepare to fit residual phase gradient:
phases = cp.angle(xcorr)  // Line 187
# Due to receiver bandpass shape, edge frequencies have less power => less certain phase
# Assign weights accordingly
weights = cp.abs(xcorr)  // Line 190```

mnicely avatar Mar 15 '21 15:03 mnicely

Thanks! I'll write some unit tests and use them to quantify the speedup.

evanmayer avatar Mar 16 '21 03:03 evanmayer

Just sat down to look at this, realized I don't yet know how to do everything here. I've incorporated 1. and will push that, at least. I assume "move to elementwise kernel" refers to writing something like this, from the CuPy docs for the operations I'm doing?

evanmayer avatar Mar 16 '21 06:03 evanmayer

Exactly! For good examples compare the following

  1. https://github.com/rapidsai/cusignal/blob/branch-0.19/python/cusignal/windows/windows.py
  2. https://github.com/scipy/scipy/blob/a376262/scipy/signal/windows/windows.py

I could look something like this

xcorr_array_kernel = cp.ElementwiseKernel(
    "float64 psd_0, float64 psd_1, float64 freqs, float64 total_lag, float64 rate"
    "float64 array",
    """
    thrust::complex<double> conj_psd_1 { thrust::conj( psd_1 ) };
    thrust::complex<double> temp { exp(
        thrust::complex<double>( 0, 2 * M_PI * freqs * div ) ) };
    
    output = psd_0 * conj_psd_1 * temp;
    """",
    "xcorr_array_kernel", 
    options=("-std=c++11",),
    loop_prep="const double div { total_lag / rate };",
)

This assume freqs is a float, nor array

mnicely avatar Mar 16 '21 10:03 mnicely