seya How to insert Spatial Transformer layer BEFORE a convnet

I would like to have a Spatial Transformer layer before a pretrained convnet, such as the the Keras ResNet50. As such I have prepared the following various attempts to connect the SpatialTransformer's output to the ResNet50's input, but get errors every time. Furthermore, I'm not entirely sure how these layers are supposed to work with the Keras Functional API.

from seya.layers.attention import SpatialTransformer
import numpy as np
from keras.layers import MaxPooling2D, Convolution2D, Dense, Activation, Flatten, Input, GlobalAveragePooling2D
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.optimizers import Adam


def locnet(input):
    # initial weights
    b = np.zeros((2, 3), dtype='float32')
    b[0, 0] = 1
    b[1, 1] = 1
    W = np.zeros((50, 6), dtype='float32')
    weights = [W, b.flatten()]

    # original from https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb
    # locnet = Sequential()
    # locnet.add(MaxPooling2D(pool_size=(2, 2), input_shape=input_shape))
    # locnet.add(Convolution2D(20, 5, 5))
    # locnet.add(MaxPooling2D(pool_size=(2, 2)))
    # locnet.add(Convolution2D(20, 5, 5))
    #
    # locnet.add(Flatten())
    # locnet.add(Dense(50))
    # locnet.add(Activation('relu'))
    # locnet.add(Dense(6, weights=weights))
    # # locnet.add(Activation('sigmoid'))

    # translated the above to functional API
    pool1 = MaxPooling2D()(input)
    conv1 = Convolution2D(20, 5, 5)(pool1)
    pool2 = MaxPooling2D()(conv1)
    conv2 = Convolution2D(20, 5, 5)(pool2)

    flatten = Flatten()(conv2)
    dense = Dense(50)(flatten)
    # dense = Activation('relu')(dense)
    params = Dense(6, weights=weights, name='affine_params')(dense)

    return params

def spatial_transformer_net(input_shape, num_categories):
    '''
    plug a spatial transformer network into a Keras resnet50
    '''

    # make an input tensor
    i = Input(input_shape)

    # get a locnet
    loc = locnet(i)

    # get a spatial transformer
    st = SpatialTransformer(localization_net=loc, downsample_factor=2, input_shape=input_shape)

    # get a pretrained convnet
    #######################################################can the SpatialTransformer be plugged in as the input tensor?
    base_model = ResNet50(weights='imagenet', include_top=False)(st)
    # freeze it
    for layer in base_model.layers:
        layer.trainable = False

    ################################################################################## or do we plug it in here somehow?
    # base_model.input = st.output
    # base_model.input = st

    # set output
    Z = base_model.get_layer('activation_49').output
    Z = GlobalAveragePooling2D()(Z)
    Z = Dense(1024, activation='relu')(Z)
    Z = Dense(num_categories, activation='softmax')(Z)

    # create the Keras functional model
    model = Model(input=i, output=Z)

    # compile the model
    model.compile(optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0), loss='categorial_crossentropy', metrics=['accuracy'])

    return model

Feb 05 '17 22:02 xavierdcruz0

I have it working but still not tested (I don't know if the output makes sense), here is my code:

data = Input(shape=(3,227,227), dtype='float32', name='_input')

b = np.zeros((2, 3), dtype='float32')
b[0, 0] = 1
b[1, 1] = 1
W = np.zeros((50, 6), dtype='float32')
weights = [W, b.flatten()]
locnet = Flatten()(data)
locnet = Dense(50)(locnet)
locnet = Activation('relu')(locnet)
locnet = Dense(6, weights=weights)(locnet)

x = SpatialTransformer(localization_net=locnet_model, downsample_factor=1, return_theta=False)(data)

The locnet is just a dummy network.

Feb 09 '17 14:02 AdrianNunez

@AdrianNunez OK fair enough. But how would you go about feeding that x into the ResNet50's input?

Something like this?

from keras.applications.resnet50 import ResNet50
net_output = ResNet50(weights='imagenet', include_top=False)(x)

But this construction gives me an error:

    x = SpatialTransformer(localization_net=locnet, downsample_factor=1, return_theta=False)(data)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 487, in __call__
    self.build(input_shapes[0])
  File "/usr/local/lib/python2.7/dist-packages/seya/layers/attention.py", line 43, in build
    self.locnet.build(input_shape)
AttributeError: 'TensorVariable' object has no attribute 'build'

Feb 13 '17 17:02 xavierdcruz0

The function ResNet50 returns a model. You first have to get the model calling the function and then use the model with your own data.

data = Input(...)
...
resnet50 = ResNet50(weights='imagenet', include_top=False, input_tensor=data)(x)
net_output = resnet50.output
x = Flatten()(net_output)
...

or

data = Input(...)
...
resnet50 = ResNet50(weights='imagenet', include_top=False)(x)
net_output = resnet50(data)
x = Flatten()(net_output)
...

In the first case I use the input_tensor parameter to specify the input of the network and then I get the actual input of the network. In the second case I create the model and then I use it with my own data.

Feb 14 '17 12:02 AdrianNunez

Using your first construction like so (with the ... denoting the locnet definition):

def my_net(input_shape, num_categories):
    data = Input(shape=input_shape, dtype='float32', name='_input')
    ...
    x = SpatialTransformer(localization_net=locnet, downsample_factor=1, return_theta=False)(data)
    r50 = ResNet50(weights='imagenet', include_top=False, input_tensor=data)(x)
    net_output = r50.output
    Z = Flatten()(net_output)
    Z = Dense(num_categories, activation='softmax')(Z)
    model = Model(input=data, output=Z)
    model.compile(optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0), loss='categorial_crossentropy')
    return model

And using the second construction like so:

def my_net(input_shape, num_categories):
    data = Input(shape=input_shape, dtype='float32', name='_input')
    ...
    x = SpatialTransformer(localization_net=locnet, downsample_factor=1, return_theta=False)(data)
    r50 = ResNet50(weights='imagenet', include_top=False)(x)
    net_output = r50(data)
    Z = Flatten()(net_output)
    Z = Dense(num_categories, activation='softmax')(Z)
    model = Model(input=data, output=Z)
    model.compile(optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0), loss='categorial_crossentropy')
    return model

I get the error:

  File "/home/qwerty/neural_nets/spatial_transformer.py", line 97, in my_net
    x = SpatialTransformer(localization_net=locnet, downsample_factor=1, return_theta=False)(data)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 487, in __call__
    self.build(input_shapes[0])
  File "/usr/local/lib/python2.7/dist-packages/seya/layers/attention.py", line 43, in build
    self.locnet.build(input_shape)
AttributeError: 'TensorVariable' object has no attribute 'build'

My seya comes from the keras1 branch, and I use keras 1.1.0

Feb 15 '17 12:02 xavierdcruz0

The locnet that you pass to the SpatialTransformer layer should be a model, not a tensor. That means it should not end in something like this:

`locnet = Dense(6, weights=weights)

But:

locnet = Dense(6, weights=weights)
locnet_model = Model(input=data, output=locnet)

x = SpatialTransformer(localization_net=locnet_model, downsample_factor=1, return_theta=False)(data)
...

The "AttributeError: 'TensorVariable' object has no attribute 'build'" error refers to this.

Apart from this, my mistake for copy-pasting, this line is not correct:

r50 = ResNet50(weights='imagenet', include_top=False)(x)

It should be:

r50 = ResNet50(weights='imagenet', include_top=False)

That is, you don't include inputs at the end between parenthesis. In fact, the input is then given in the following line:

net_output = r50(data)

My apologies for this error.

Feb 15 '17 15:02 AdrianNunez

Is it resolved ?

Apr 01 '18 18:04 kbeznakparmatonic

seya seya copied to clipboard

How to insert Spatial Transformer layer BEFORE a convnet

seya
seya copied to clipboard