image-similarity-deep-ranking
image-similarity-deep-ranking copied to clipboard
About SubSample
Hey Akarsh, thank you for sharing your great work! I'm new to deep learning, what I'm confused is that you did not implement the SubSample but enlarge the strides. As I have learned, it's not equal, right? Does it have the same or better effect? Is there some special consideration, or some good experience?
@zyq001 do you solve this problem? please let me know
Hey Mike and Longzeyilang! Sub-sampling is same as strides.
"subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere."- https://faroit.github.io/keras-docs/1.2.2/layers/convolutional/#convolution2d
@akarshzingade Thank you for your reply! There is some difference between you and me! According to "Learning Fine-grained Image Similarity with Deep Ranking", the multiscale network structure of Figure 3. your practice may be some problem! def deep_rank_model():
convnet_model = convnet_model_()
first_input = Input(shape=(56,56,3))
first_conv = Conv2D(96, kernel_size=(8, 8),strides=(4,4), padding='same')(first_input)
first_max = MaxPool2D(pool_size=(3,3),strides = (2,2),padding='same')(first_conv)
first_max = Flatten()(first_max)
first_max = Lambda(lambda x: K.l2_normalize(x,axis=1))(first_max)
second_input = Input(shape=(28,28,3))
second_conv = Conv2D(96, kernel_size=(8, 8),strides=(4,4), padding='same')(second_input)
second_max = MaxPool2D(pool_size=(7,7),strides = (4,4),padding='same')(second_conv)
second_max = Flatten()(second_max)
second_max = Lambda(lambda x: K.l2_normalize(x,axis=1))(second_max)
merge_one = concatenate([first_max, second_max])
merge_two = concatenate([merge_one, convnet_model.output])
emb = Dense(4096)(merge_two)
emb = Dropout(0.6)(emb)
l2_norm_final = Lambda(lambda x: K.l2_normalize(x,axis=1))(emb)
final_model = Model(inputs=[first_input, second_input, convnet_model.input], outputs=l2_norm_final)
return final_model
Hey, Longzeyilang. I believe the implementation does follow the architecture shown in Figure 3. Please let me know what the difference is :)
@akarshzingade @longzeyilang I have tried both the implementations using the corrected triplet loss function i.e the one in the paper and the one by Akarsh. Don't know why but I am getting better results using akarshs implementation on the exactstreet2shop dataset. However , the results are not good and I am looking for more improvement.
@akarshzingade I think there is a difference between your implementation and the network given in the paper. In your implementation, each of the three networks is fed an image of 224,224,3 and the stride value for the maxpool kernel is greater than the kernel size . In this way some pixel positions are always ignored by the maxpool kernel . Whereas the input image size in the paper is different for each of the networks i,e (224,224,3), (56,56,3) and (28,28,3) and the value of stride is less than the kernel size .
@longzeyilang, I tried your version of 'def deep_rank_model():' but got the following error: Error when checking input: expected input_2 to have shape (56, 56, 3) but got array with shape (224, 224, 3)
Any other place to adjust in the code? thanks a lot
@ha121ppy you will have to change the next method . This works for me .
def next(self):
"""For python 2.x.
# Returns
The next batch.
"""
with self.lock:
index_array, current_index, current_batch_size = next(self.index_generator)
# The transformation of images is not under thread lock
# so it can be done in parallel
batch_x = np.zeros((current_batch_size,) + self.image_shape, dtype=K.floatx())
batch_x_1 = np.zeros((current_batch_size,) + (57,57,3), dtype=K.floatx())
batch_x_2 = np.zeros((current_batch_size,) + (29,29,3), dtype=K.floatx())
grayscale = self.color_mode == 'grayscale'
for i, j in enumerate(index_array):
fname = self.filenames[j]
img = load_img(os.path.join(self.directory, fname.split('\r')[0]),
grayscale=grayscale,
target_size=self.target_size)
img_1 = img.resize((57,57))
img_2 = img.resize((29,29))
x = img_to_array(img, data_format=self.data_format)
x_1 = img_to_array(img_1, data_format=self.data_format)
x_2 = img_to_array(img_2, data_format=self.data_format)
x = self.image_data_generator.random_transform(x)
x_1 = self.image_data_generator.random_transform(x_1)
x_2 = self.image_data_generator.random_transform(x_2)
x = self.image_data_generator.standardize(x)
x_1 = self.image_data_generator.standardize(x_1)
x_2 = self.image_data_generator.standardize(x_2)
batch_x[i] = x
batch_x_1[i] = x_1
batch_x_2[i] = x_2
# optionally save augmented images to disk for debugging purposes
if self.save_to_dir:
for i in range(current_batch_size):
img = array_to_img(batch_x[i], self.data_format, scale=True)
fname = '{prefix}_{index}_{hash}.{format}'.format(prefix=self.save_prefix,
index=current_index + i,
hash=np.random.randint(1e4),
format=self.save_format)
img.save(os.path.join(self.save_to_dir, fname))
# build batch of labels
if self.class_mode == 'input':
batch_y = batch_x.copy()
elif self.class_mode == 'sparse':
batch_y = self.classes[index_array]
elif self.class_mode == 'binary':
batch_y = self.classes[index_array].astype(K.floatx())
elif self.class_mode == 'categorical':
batch_y = np.zeros((len(batch_x), self.num_class), dtype=K.floatx())
else:
return batch_x
return [batch_x_1,batch_x_2,batch_x] , batch_y
I have already tested above code. In fact, I change the tript loss, and use the centre loss function.
@IAmAbdusKhan,@longzeyilang Thanks!
@longzeyilang what do you mean ' the centre loss function'? I have adjust loss function as following, do you mean that?
def triplt_loss(y_true,y_pred): y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON) loss=tf.convert_to_tensor(0,dtype=tf.float32) total_loss=tf.convert_to_tensor(0,dtype=tf.float32) g=tf.constant(1.0,shape=[1],dtype=tf.float32) zero=tf.constant(0.0,shape=[1],dtype=tf.float32) for i in range(0,batch_size,3): try: q_embedding=y_pred[i] p_embedding=y_pred[i+1] n_embedding=y_pred[i+2] D_q_p=K.sqrt(K.sum((q_embedding-p_embedding)**2)) D_q_n=K.sqrt(K.sum((q_embedding-n_embedding)**2)) loss=tf.maximum(g+D_q_p-D_q_n,zero) total_loss=total_loss+loss except: continue total_loss=total_loss/(batch_size/3) return total_loss
besides, I meet a problem: After data augmentation(I generate about 20 transformation from each image), the model rarely improved. Do you have any idea about the reason? I am trying to adjust net structure but doubt it works. My purpose is to let the transformed image get smallest distance with raw image, but event putting them in training data, it still treat them as different images(large distance)
@ha121ppy Have you found anything to improve the accuracy ?