scikit-learn-extra
scikit-learn-extra copied to clipboard
Fastfood is slow
Fastfood is somewhat slower than RBFSampler (while in theory it should be faster).
Google Colab timings (sklearn 0.24.2, sklearn_extra 0.2.0, numpy 1.19.5)
Laptop timings (Ubuntu 20.04, Intel 8300H, 32GB RAM) (sklearn 0.23.2, sklearn_extra 0.2.0, numpy 1.19.2)
Sample code
!pip install scikit-learn-extra
import time
import numpy as np
from matplotlib import pyplot as plt
from sklearn.kernel_approximation import RBFSampler
from sklearn_extra.kernel_approximation import Fastfood
X = np.random.normal(size=(100_000, 100))
n_components = [64, 128, 256, 512, 1024, 2048]
timings_ff = []
timings_rbfs = []
for n in n_components:
ff = Fastfood(n_components=n, random_state=1)
tic = time.perf_counter()
X_t = ff.fit_transform(X)
tac = time.perf_counter()
timings_ff.append(tac - tic)
for n in n_components:
rbfs = RBFSampler(n_components=n, random_state=1)
tic = time.perf_counter()
X_t = rbfs.fit_transform(X)
tac = time.perf_counter()
timings_rbfs.append(tac - tic)
plt.plot(n_components, timings_ff);
plt.plot(n_components, timings_rbfs);
Changing tradeoff_mem_accuracy to 'mem' did not affect speed.
bottleneck seems to be _phi function
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.289 0.144 0.289 0.144 _fastfood.py:104(_approx_fourier_transformation_multi_dim)
1 0.000 0.000 0.000 0.000 _fastfood.py:108(_l2norm_along_axis1)
1 0.000 0.000 0.000 0.000 _fastfood.py:112(_uniform_vector)
1 0.045 0.045 0.389 0.389 _fastfood.py:118(_apply_approximate_gaussian_matrix)
1 0.036 0.036 0.036 0.036 _fastfood.py:136(_scale_transformed_data)
1 0.441 0.441 0.441 0.441 _fastfood.py:144(_phi)
1 0.000 0.000 0.007 0.007 _fastfood.py:153(fit)
1 0.000 0.000 0.000 0.000 _fastfood.py:190(<listcomp>)
1 0.000 0.000 0.897 0.897 _fastfood.py:207(transform)
1 0.000 0.000 0.000 0.000 _fastfood.py:72(_is_number_power_of_two)
1 0.000 0.000 0.000 0.000 _fastfood.py:76(_enforce_dimensionality_constraints)
1 0.000 0.000 0.024 0.024 _fastfood.py:89(_pad_with_zeros)
numpy trigonometric functions (cos, sin) are slow (they take the same amount of time as _approx_fourier_transformation_multi_dim
individually), numba does not help. Maybe pure Cython implementation of _phi function would be faster.
Thanks for the investigation @GLevV !
There are two issues there,
- first is that _phi (and therefore
transform
) returns(n_samples, 2*n_components)
instead of(n_samples, n_components)
fortradeoff_mem_accuracy == "accuracy"
. That is a bug. Also because it manipulates a 2X larger matrix that it should, it's therefore 2X slower. We should really add a unit test to detect this. - Generally the function _phi should be modified to use more inplace operations (https://stackoverflow.com/a/22952411/1791279), though computing cos and sin would likely be expensive in any case.
A PR to fix/improve that situation would be very welcome
Judging by this implementation it should return matrix NxN and should be used with SVM(kernel='precomputed').
On the other hand in the same repo, they are using [cos(X), sin(X)] matrix as a feature matrix for the next model (and it will be obviously 2*n_components). Straightforward solution would be to divide n_components by 2 (or decrease the power of 2 by 1) for "accuracy" case.
[Cos, Sin] matrix is a matrix of real and imaginary components of Fourier transform. In these cases sometimes only real part of transformation is taken and imaginary part is dropped, but then there will be no difference between 'accuracy' and 'mem' cases.
We can either warn users that in accuracy case feature space will be 2 times higher, or we can drop 'accuracy' case for now and just leave random kitchen sinks method.
I don't think "[Cos, Sin] matrix is a matrix of real and imaginary components of Fourier transform." is true, if that were the case we could have used fft and we would have obtained a faster implementation but in fact here we don't use fourier transform or am I wrong ?
Well, I don't know particular implementation, but in paper "Fastfood: Approximate Kernel Expansions in Loglinear Time" [1] they stated
Which is the same as [cos(X), sin(X)] / sqrt(n), so I assumed that cos(X) is real part and sin(X) is imaginary part, but I could be wrong.
Yes this is not a Fourier transform. Fourier transform involves a sum (or an integral) of exponential, and here we just have an exponential but not a sum. But you are right that cos(X) and sin(X) are the real and imaginary part of the feature map phi_j(x). And I think the "mem" case is exactly as you described : we drop the imaginary part.
Well, I guess the only thing left is to replace numpy math operations with python ones where needed, sice python operations are faster with single numbers. Like these https://github.com/scikit-learn-contrib/scikit-learn-extra/blob/c1dfd25aa76e1d11655329ad04d3a263be8ca7eb/sklearn_extra/kernel_approximation/_fastfood.py#L80 https://github.com/scikit-learn-contrib/scikit-learn-extra/blob/c1dfd25aa76e1d11655329ad04d3a263be8ca7eb/sklearn_extra/kernel_approximation/_fastfood.py#L141 https://github.com/scikit-learn-contrib/scikit-learn-extra/blob/c1dfd25aa76e1d11655329ad04d3a263be8ca7eb/sklearn_extra/kernel_approximation/_fastfood.py#L146 https://github.com/scikit-learn-contrib/scikit-learn-extra/blob/c1dfd25aa76e1d11655329ad04d3a263be8ca7eb/sklearn_extra/kernel_approximation/_fastfood.py#L151
And maybe change default tradeoff to 'mem', since it is be more consistent and fast.