keras-preprocessing The "transpose trick" for quick zca approximation solving issue #55

I implemented the more or less known "transpose trick" for a fix of issue #55, my code is here. Is it alright? Do I have to include changes not only to keras-preprocessing/keras_preprocessing/image/image_data_generator.py, but also to keras/keras/preprocessing/image.py?

Summary

Included variable zca_rotated that allows the user to choose if the svd in the zca whitening should be calculated on the data rows=features (False) or on the columns=examples (True) and saves time when the smaller one is chosen. When False is selected it calculates zca_whitening in the usual way. In some cases it can solve the problem of never completing svd computation when the covariance matrix is too large by reducing the size of the matrix. The idea is explained with examples in issue #55 on github. The method is not an approximation, but it gives exactly the same result as the current method with the difference of saving time if the number of images < number of features. I needed only to make changes in a few lines of code, but for them to work I am not sure if I have to add changes to keras/keras/preprocessing/image.py like defining the new variable zca_rotated there.

Related Issues

#55, #8706 in keras-team/keras

PR Overview

[ y] This PR requires new unit tests [y/n] (make sure tests are included) I am not sure what is meant by "unit test", but below is the minimal code of the new implementation and it can be also seen here:

from keras.datasets import cifar10
import numpy as np
from scipy import linalg

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X = X_train[:1000]
flat_x = np.reshape(x, (x.shape[0], x.shape[1] * x.shape[2] * x.shape[3]))

# line below lets you choose if you want to calculate the standard zca, like in keras (=False)
# or if you want to treat the input data X as rotated to simplify the calculation if X.shape[0]=m < X.shape[1]=n
zca_rotated = True

# normalize x:
flat_x = flat_x / 255.
flat_x = flat_x - flat_x.mean(axis=0)

# CHANGES HAPPEN BELOW.
# if m>n execute the svd as usual
if zca_rotated=False:
    sigma = np.dot(flat_x.T, flat_x) / flat_x.shape[0]
    u, s, _ = linalg.svd(sigma)
# and if m<n do the trnaspose trick
if zca_rotated=True:
    sigma = np.dot(flat_x, flat_x.T) / flat_x.shape[0]
    u, s, _ = linalg.svd(sigma)
    u = np.dot(flat_x.T, u) / np.sqrt(s*flat_x.shape[0])

s_inv = 1. / np.sqrt(s[np.newaxis] + 0.1) # the 0.1 is the epsilon value for zca
principal_components = (u * s_inv).dot(u.T)
whitex = np.dot(flat_x, principal_components)

[ y] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date) I provided a description of the new variable zca_rotated, that might be included in the documentation: zca_rotated: Boolean. Denotes if in zca the svd should be calculated on the data rows=features (False) or on the columns=examples (True) and saves time when the smaller of the two is chosen. Presents an approximation to zca_whitening. Default is 'False".
[ y] This PR is backwards compatible [y/n] The default is zca_rotated=False and in such a case there is no change to the current state, zca is calculated as of now
[ n] This PR changes the current API [y/n] (all API changes need to be approved by fchollet) I am not sure, but I think it doesn't change the API

May 07 '20 16:05 Nestak2

Hello,

With ZCA whitening not being as popular as at the beginning of Deep learning, I don't think we should include this in Keras-prepro.

I'm open to discussion :)

May 12 '20 23:05 Dref360

Thanks for the answer, @Dref360 ! Is there any detriment, if this zca change would be included in Keras-prepro? Maybe zca isn't that popular for keras users, because the current calculation is very long for large images (just speculating here)? Cheers

May 13 '20 14:05 Nestak2

Hi @Dref360 , I found a way to make my method not just an approximation but exactly the same result as the current zca transformation and still to run quicker. I can also make it run without an additional user specified variable, with an if-clause the code can just determine what would be the quicker svd approach and calculate accordingly. It would look like this, just 5 new lines:

# CHANGES HAPPEN BELOW.
# if m>n execute the svd as usual
if flat_x.shape[0] => flat_x.shape[1]:
    sigma = np.dot(flat_x.T, flat_x) / flat_x.shape[0]
    u, s, _ = linalg.svd(sigma)
# and if m<n do the trnaspose trick
if flat_x.shape[0] < flat_x.shape[1]:
    sigma = np.dot(flat_x, flat_x.T) / flat_x.shape[0]
    u, s, _ = linalg.svd(sigma)
    u = np.dot(flat_x.T, u) / np.sqrt(s*flat_x.shape[0])

Let me know if you think it's valuable and I will make the changes to the keras pre-pro file and submit them.

May 20 '20 15:05 Nestak2

Hi, @Nestak2 Can you please explain why we need to divide by np.sqrt(s*flat_x.shape[0]) in the last line of the above code? Thank you.

Oct 04 '22 21:10 tranngocphu

@tranngocphu Hi, if I remember correctly np.sqrt(s*flat_x.shape[0]) is a normalization factor for the vectors u and is connected to their eigenvalues. So you might be able to just leave it out. You can find versions of the factor without the term flat_x.shape[0], so only np.sqrt(s), if sigma is calculated before as sigma=np.dot(flat_x,flat_x.T). I wrote a little Medium article a few years ago about the transpose trick, it doesn't give more explanation about this factor, but it shows the reader the mathematical justification for the transpose trick and gives a bit more background. If you want to read it it is here

Oct 21 '22 16:10 Nestak2