image-similarity-deep-ranking icon indicating copy to clipboard operation
image-similarity-deep-ranking copied to clipboard

About SubSample

Open zyq001 opened this issue 6 years ago • 11 comments

Hey Akarsh, thank you for sharing your great work! I'm new to deep learning, what I'm confused is that you did not implement the SubSample but enlarge the strides. As I have learned, it's not equal, right? Does it have the same or better effect? Is there some special consideration, or some good experience?

zyq001 avatar Jun 12 '18 09:06 zyq001

@zyq001 do you solve this problem? please let me know

longzeyilang avatar Jul 01 '18 06:07 longzeyilang

Hey Mike and Longzeyilang! Sub-sampling is same as strides.

"subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere."- https://faroit.github.io/keras-docs/1.2.2/layers/convolutional/#convolution2d

akarshzingade avatar Jul 03 '18 13:07 akarshzingade

@akarshzingade Thank you for your reply! There is some difference between you and me! According to "Learning Fine-grained Image Similarity with Deep Ranking", the multiscale network structure of Figure 3. your practice may be some problem! def deep_rank_model():

convnet_model = convnet_model_()
first_input = Input(shape=(56,56,3))
first_conv = Conv2D(96, kernel_size=(8, 8),strides=(4,4), padding='same')(first_input)
first_max = MaxPool2D(pool_size=(3,3),strides = (2,2),padding='same')(first_conv)
first_max = Flatten()(first_max)
first_max = Lambda(lambda  x: K.l2_normalize(x,axis=1))(first_max)

second_input = Input(shape=(28,28,3))
second_conv = Conv2D(96, kernel_size=(8, 8),strides=(4,4), padding='same')(second_input)
second_max = MaxPool2D(pool_size=(7,7),strides = (4,4),padding='same')(second_conv)
second_max = Flatten()(second_max)
second_max = Lambda(lambda  x: K.l2_normalize(x,axis=1))(second_max)

merge_one = concatenate([first_max, second_max])
merge_two = concatenate([merge_one, convnet_model.output])
emb = Dense(4096)(merge_two)
emb = Dropout(0.6)(emb)
l2_norm_final = Lambda(lambda  x: K.l2_normalize(x,axis=1))(emb)

final_model = Model(inputs=[first_input, second_input, convnet_model.input], outputs=l2_norm_final)
return final_model

longzeyilang avatar Jul 04 '18 05:07 longzeyilang

Hey, Longzeyilang. I believe the implementation does follow the architecture shown in Figure 3. Please let me know what the difference is :)

akarshzingade avatar Jul 06 '18 06:07 akarshzingade

@akarshzingade @longzeyilang I have tried both the implementations using the corrected triplet loss function i.e the one in the paper and the one by Akarsh. Don't know why but I am getting better results using akarshs implementation on the exactstreet2shop dataset. However , the results are not good and I am looking for more improvement.

IAmAbdusKhan avatar Jul 09 '18 11:07 IAmAbdusKhan

@akarshzingade I think there is a difference between your implementation and the network given in the paper. In your implementation, each of the three networks is fed an image of 224,224,3 and the stride value for the maxpool kernel is greater than the kernel size . In this way some pixel positions are always ignored by the maxpool kernel . Whereas the input image size in the paper is different for each of the networks i,e (224,224,3), (56,56,3) and (28,28,3) and the value of stride is less than the kernel size .

IAmAbdusKhan avatar Jul 09 '18 20:07 IAmAbdusKhan

@longzeyilang, I tried your version of 'def deep_rank_model():' but got the following error: Error when checking input: expected input_2 to have shape (56, 56, 3) but got array with shape (224, 224, 3)

Any other place to adjust in the code? thanks a lot

ha121ppy avatar Jul 23 '18 08:07 ha121ppy

@ha121ppy you will have to change the next method . This works for me .

def next(self):
    """For python 2.x.
    # Returns
        The next batch.
    """
    with self.lock:
        index_array, current_index, current_batch_size = next(self.index_generator)
    # The transformation of images is not under thread lock
    # so it can be done in parallel
    batch_x = np.zeros((current_batch_size,) + self.image_shape, dtype=K.floatx())
    batch_x_1 = np.zeros((current_batch_size,) + (57,57,3), dtype=K.floatx())
    batch_x_2 = np.zeros((current_batch_size,) + (29,29,3), dtype=K.floatx())
    
    grayscale = self.color_mode == 'grayscale'

    for i, j in enumerate(index_array):
        fname = self.filenames[j]
     
        img = load_img(os.path.join(self.directory, fname.split('\r')[0]),
                   grayscale=grayscale,
                   target_size=self.target_size)
        

        img_1 = img.resize((57,57))
        img_2 = img.resize((29,29))
        
        x = img_to_array(img, data_format=self.data_format)
        x_1 = img_to_array(img_1, data_format=self.data_format)
        x_2 = img_to_array(img_2, data_format=self.data_format)
        
        x = self.image_data_generator.random_transform(x)
        x_1 = self.image_data_generator.random_transform(x_1)
        x_2 = self.image_data_generator.random_transform(x_2)
        
        x = self.image_data_generator.standardize(x)
        x_1 = self.image_data_generator.standardize(x_1)
        x_2 = self.image_data_generator.standardize(x_2)

        batch_x[i] = x
        batch_x_1[i] = x_1
        batch_x_2[i] = x_2


    # optionally save augmented images to disk for debugging purposes
    if self.save_to_dir:
        for i in range(current_batch_size):
            img = array_to_img(batch_x[i], self.data_format, scale=True)
            fname = '{prefix}_{index}_{hash}.{format}'.format(prefix=self.save_prefix,
                                                              index=current_index + i,
                                                              hash=np.random.randint(1e4),
                                                              format=self.save_format)
            img.save(os.path.join(self.save_to_dir, fname))
    # build batch of labels
    if self.class_mode == 'input':
        batch_y = batch_x.copy()
    elif self.class_mode == 'sparse':
        batch_y = self.classes[index_array]
    elif self.class_mode == 'binary':
        batch_y = self.classes[index_array].astype(K.floatx())
    elif self.class_mode == 'categorical':
        batch_y = np.zeros((len(batch_x), self.num_class), dtype=K.floatx())

    else:
        return batch_x

    return [batch_x_1,batch_x_2,batch_x] , batch_y 

IAmAbdusKhan avatar Jul 23 '18 08:07 IAmAbdusKhan

I have already tested above code. In fact, I change the tript loss, and use the centre loss function.

longzeyilang avatar Jul 23 '18 11:07 longzeyilang

@IAmAbdusKhan,@longzeyilang Thanks! @longzeyilang what do you mean ' the centre loss function'? I have adjust loss function as following, do you mean that? def triplt_loss(y_true,y_pred): y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON) loss=tf.convert_to_tensor(0,dtype=tf.float32) total_loss=tf.convert_to_tensor(0,dtype=tf.float32) g=tf.constant(1.0,shape=[1],dtype=tf.float32) zero=tf.constant(0.0,shape=[1],dtype=tf.float32) for i in range(0,batch_size,3): try: q_embedding=y_pred[i] p_embedding=y_pred[i+1] n_embedding=y_pred[i+2] D_q_p=K.sqrt(K.sum((q_embedding-p_embedding)**2)) D_q_n=K.sqrt(K.sum((q_embedding-n_embedding)**2)) loss=tf.maximum(g+D_q_p-D_q_n,zero) total_loss=total_loss+loss except: continue total_loss=total_loss/(batch_size/3) return total_loss

besides, I meet a problem: After data augmentation(I generate about 20 transformation from each image), the model rarely improved. Do you have any idea about the reason? I am trying to adjust net structure but doubt it works. My purpose is to let the transformed image get smallest distance with raw image, but event putting them in training data, it still treat them as different images(large distance)

ha121ppy avatar Jul 24 '18 03:07 ha121ppy

@ha121ppy Have you found anything to improve the accuracy ?

christophesmet avatar Aug 21 '18 17:08 christophesmet