aeon [ENH] Preferred Fourier transform

Describe the feature or idea you want to propose

currently, we have two ways of doing FFT, standard numpy fft and pyfftw which wraps the c code https://www.fftw.org/ . It would be good to benchmark to see if there is any benefit from the extra soft dependency.

Describe your proposed solution

work out which is faster and stick with one approved FFT package. (also test they are equivalent!)

Describe alternatives you've considered, if relevant

there are no doubt other fft packages

Additional context

used in RDST and in the new collection transformers introduced here https://github.com/aeon-toolkit/aeon/pull/508/files

Jun 30 '23 15:06 TonyBagnall

Hi Tony, as a note for RDST, the way I wanted to use FFT was to use Mueen algorithm for computing distance profiles in the Matrix Profile (See below for a comparison of performance for different methods).

If, to compute a sliding dot product given a query $Q$ a time series $T$, we compare a naive numba approach to a numpy convolve (which do not use fft), and a scipy convolve (which use fft for "large enough" arrays, see this), we obtain the following (we clearly see the breakpoint when fft is used on scipy):

ex_fft_convolve

In this use case, FFT is used to perform a faster convolution for large arrays, but I don't know how scipy/numpy fft compares to pyfftw. Is there already something in the aeon API that I could use for sliding dot products, or should I work on including this in a future PR ?

To reproduce the results:

from numba import njit, prange
import numpy as np
import pandas as pd
from numpy import convolve as np_convolve
from scipy.signal import convolve as scipy_convolve

def scipy_sliding_dot_product(Q, T):
    """
    Use scipy convolve to calculate the sliding window dot product.

    Parameters
    ----------
    Q : array, shape=(length, n_features)
        Query array or subsequence

    T : array, shape=(n_timestamps, n_features)
        Time series or sequence

    Returns
    -------
    output : shape=(n_timestamps - (length - 1), n_features)
        Sliding dot product between `Q` and `T`.
    """
    m = T.shape[0]
    length = Q.shape[0]
    k = T.shape[1]
    out = np.zeros((m - (length - 1), k))
    for i in range(k):
        out[:, i] = scipy_convolve(np.flipud(Q[:, i]), T[:, i], mode='valid').real
    return out


def numpy_sliding_dot_product(Q, T):
    """
    Use numpy convolve to calculate the sliding window dot product.

    Parameters
    ----------
    Q : array, shape=(length, n_features)
        Query array or subsequence

    T : array, shape=(n_timestamps, n_features)
        Time series or sequence

    Returns
    -------
    output : shape=(n_timestamps - (length - 1), n_features)
        Sliding dot product between `Q` and `T`.
    """
    m = T.shape[0]
    length = Q.shape[0]
    k = T.shape[1]
    out = np.zeros((m - (length - 1), k))
    for i in range(k):
        out[:, i] = np_convolve(np.flipud(Q[:, i]), T[:, i], mode='valid')
    return out


@njit(fastmath=True, cache=True, parallel=True)
def numba_sliding_dot_product(Q, T):
    """
    Calculate the sliding window dot product with numba.

    Parameters
    ----------
    Q : array, shape=(length, n_features)
        Query array or subsequence

    T : array, shape=(n_timestamps, n_features)
        Time series or sequence

    Returns
    -------
    output : shape=(n_timestamps - (length - 1), n_features)
        Sliding dot product between `Q` and `T`.
    """
    m = T.shape[0]
    length = Q.shape[0]
    n_ft = T.shape[1]
    out = np.zeros((m - (length - 1), n_ft))
    for i in prange(out.shape[0]):
        for j in prange(length):
            for k in prange(n_ft):
                out[i, k] += Q[j, k] * T[i + j, k]
    return out

# In[]:

from matplotlib import pyplot as plt
for i, Q_length in enumerate([0.05,0.1,0.25,0.5]):
    times = pd.DataFrame()
    for length in [100,250,500,1000,2500,5000,7500,10000]:
        Q = np.random.rand(int(length*Q_length),1)
        T = np.random.rand(length, 1)
        r_scipy = %timeit -o scipy_sliding_dot_product(Q, T)
        r_numpy = %timeit -o numpy_sliding_dot_product(Q, T)
        r_numba = %timeit -o numba_sliding_dot_product(Q, T)
        times.loc[length, 'numpy convolve'] = r_numpy.average
        times.loc[length, 'scipy convolve'] = r_scipy.average
        times.loc[length, 'numba convolve'] = r_numba.average
    times.plot(
        title='Q length ratio to T :'+str(Q_length),
        xlabel='T length',
        ylabel='runtime (s)',
        ax=ax[i//2, i%2]
    )
plt.show()

Jul 01 '23 11:07 baraline

sorry @baraline I didnt see you comment!

I have done a simple comparison using the PeriodogramTransformer

from aeon.transformations.collection import PeriodogramTransformer
import time
import numpy as np

p1 = PeriodogramTransformer(use_pyfftw=True)
p2 = PeriodogramTransformer(use_pyfftw=False)


for i in range(1000,200001,1000):
    X = np.random.random((i, 2, i))
    start=time.time()
    X2 = p1.fit_transform(X)
    t1 = time.time()-start
    start=time.time()
    X2 = p2.fit_transform(X)
    print(f"{i},{t1},{time.time()-start}")

numpy FFT takes about 70% of the time of pyfftw.

Given this is the only place pyfftw is used, and that all tests pass with it switched off, I think this is enough evidence to remove it.

Oct 16 '24 08:10 TonyBagnall

y axis is time in seconds

Oct 16 '24 08:10 TonyBagnall

aligned ratio of numpy time/pyfftw time

Oct 16 '24 08:10 TonyBagnall

Regarding sliding dot products, there is the following in aeon:

aeon.utils.numba.general.sliding_dot_product (Numba)
scipy.signal.correlate in https://github.com/aeon-toolkit/aeon/blob/fb877e7bd83c425bf47dd2802cbbc880190b3b48/aeon/distances/_sbd.py#L242-L243
scipy.signal.convolve in your fft_sliding_dot_product implementation in similarity search
scipy.signal.fftconvolve in: https://github.com/aeon-toolkit/aeon/blob/fb877e7bd83c425bf47dd2802cbbc880190b3b48/aeon/transformations/series/_bkfilter.py#L101
numpy.fft.fft in https://github.com/aeon-toolkit/aeon/blob/fb877e7bd83c425bf47dd2802cbbc880190b3b48/aeon/distances/_mpdist.py#L148-L149

Oct 16 '24 14:10 SebastianSchmidl

aeon aeon copied to clipboard

[ENH] Preferred Fourier transform

Describe the feature or idea you want to propose

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

aeon
aeon copied to clipboard