LOUPE icon indicating copy to clipboard operation
LOUPE copied to clipboard

Parameters Clarification

Open WHBSmith opened this issue 5 years ago • 0 comments

Hi, firstly thank you for making this code public! I am currently looking to reproduce some of the experiments done in the NetVLAD paper. To do this I'd like to use the VLAD layer as defined here and appended to the end of a VGG16 network as they do in the paper. The output shape of a feature from VGG16, minus the final classification layer is 7,7,512. However, I can't figure out how to pass this to the VLAD layer as defined here. There seems to be roughly four input parameters: feature_size, max_samples, cluster_size, output_dim

The paper describes an overview of the system: "Formally, given N D-dimensional local image descriptors as input, and K cluster centres (“visual words”) as VLAD parameters, the output VLAD image representation V is K×D-dimensional. For convenience we will write V as a K ×D matrix, but this matrix is converted into a vector and, after normalization, used as the image representation"

Feature size: should this be the flattened feature dimensions, i.e. 7x7x512 or just 512, is this the D from the paper? max_samples: I can't find an explicit mention of this parameter anywhere apart from this code. cluster_size: number of clusters (K in the original paper) output dimensions: presumably this is KxD?

Some clarification of this would really be appreciated

WHBSmith avatar Dec 09 '19 16:12 WHBSmith