keras-attention-augmented-convs
keras-attention-augmented-convs copied to clipboard
Request for better example code
I tried using:
ip = (img_height, img_width, 3)
model = Sequential() model.add(augmented_conv2d(ip, 8) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2)))
and I get error on model syntax in line with Activation('relu').
Sequential expects Layers, whereas augmented_conv expects Tensors as input. You cannot use augmented_conv(...) with a python tuple of the input shape. It requires a Keras Input layer or a Keras Tensor as input.
Basically, you cannot use this with sequential models. It must be used with Functional models or with Model subclassing.
I think the example of using ip = Input(...) explains clearly that ip is the Input Layer in the Functional API.
If you have any ideas to make it clearer, do share and I'll incorporate it in the readme
I think a mini-full example without the '...' would help, like for example, this is what I get now when trying to use Functional API:
from attn_augconv import augmented_conv2d
input_shape=(700, 250, 3) ip = Input(input_shape) X1 = augmented_conv2d(ip, 8) X = Activation('relu')(X1) X = MaxPooling2D((3,3), strides=(2,2), padding='same')(X) X = Flatten()(X) X = Dense(1024, activation='relu')(X) X = Dense(1, activation='sigmoid')(X) model = Model(inputs=X1, outputs=X)
model.summary()
model.compile(loss='binary_crossentropy', optimizer='Adadelta', metrics=['accuracy'])
Traceback (most recent call last):
File "a2.py", line 104, in
Ok so that is a combination of many issues.
First and foremost, when using a function, understand what the default values represent. You are using the default value of depth_k and depth_v by not passing them as keyword args.
Lets look at the default signature:
def augmented_conv2d(ip, filters, kernel_size=(3, 3), strides=(1, 1),
depth_k=0.2, depth_v=0.2, num_heads=8, relative_encodings=True):
"""
Builds an Attention Augmented Convolution block.
Args:
ip: keras tensor.
filters: number of output filters.
kernel_size: convolution kernel size.
strides: strides of the convolution.
depth_k: float or int. Number of filters for k.
depth_v: float or int. Number of filters for v.
num_heads: int. Number of attention heads.
relative_encodings: bool. Whether to use relative
encodings or not.
Returns:
a keras tensor.
"""
In hindsight, I should probably have made the computation of depth_k and depth_v relative to filters more explicit.
When passed as floats, they are mapped as filters * depth_k/v. (it was a bug that it was computing ip_filters * depth_k/v).
Now onto your mistake. Even with the bugfix, this code will not run. Look at depth_k and depth_v default values. If you compute 0.2 * 8, you get 1.6, which will raise a ValueError inside AttentionAugmentation2D stating that depth_k is not divisible by num_heads (in this case defaults to 8!).
Easy fixes are :
- increase filters from 8 to 10, and decrease num_heads to 1 or 2 to get a divisible number.
(10 * 0.2 = 2; 2 // 1 or 2 // 2 is ok). - increase depth_k = depth_v from 0.2 to 0.5 and make num_heads = 1 or 2 or 4 to get a divisible number.
(8 * 0.5 = 4; 4 // 1 or 4 // 2 or 4 // 4 is ok).
On my part, I can do some quick checks when building the model itself which raises an error if incorrect inputs are passed.
And these docstring improvements and the normalization bugfix is added via 7680cdcd78fdca0260d1ebdde36ca06231f908c9
I'll keep this comment open in case anyone has similar issues in the future.
I try to use the same model as @aletote that you already fixed. But when i complied and fit my data, it showed the error
tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72TiB
I am very confuse, because the total params is 5,763,033.
ps. my input size is (299,299,3) with batch_size = 64
I try to use the same model as @aletote that you already fixed. But when i complied and fit my data, it showed the error
tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.72TiB
I am very confuse, because the total params is 5,763,033.
ps. my input size is (299,299,3) with batch_size = 64
Now, i change batzh_size to 8 and my input size to (32,32,3). It can successfully comply and fit the training data with total params around 60k. But i still don't understand, why it allocates so much gpu memory.