I think the code is wrong
I compare the code between this version and Facebook's version. I find this version is not non-local net which published in the CVPR or arXiv. This is the source code of Facebook:
def spacetime_nonlocal(
model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner,
is_test, max_pool_stride=2):
# ---------------------
cur = blob_in
# we do projection to convert each spacetime location to a feature
# theta original size
# e.g., (8, 1024, 4, 14, 14) => (8, 1024, 4, 14, 14)
theta = model.ConvNd(
cur, prefix + '_theta',
dim_in,
dim_inner,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
# phi and g: half spatial size
# e.g., (8, 1024, 4, 14, 14) => (8, 1024, 4, 7, 7)
if cfg.NONLOCAL.USE_MAXPOOL is True:
max_pool = model.MaxPool(
cur, prefix + '_pool',
kernels=[1, max_pool_stride, max_pool_stride],
strides=[1, max_pool_stride, max_pool_stride],
pads=[0, 0, 0] * 2,
)
else:
max_pool = cur
phi = model.ConvNd(
max_pool, prefix + '_phi',
dim_in,
dim_inner,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
g = model.ConvNd(
max_pool, prefix + '_g',
dim_in,
dim_inner,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
# we have to use explicit batch size (to support arbitrary spacetime size)
# e.g., (8, 1024, 4, 14, 14) => (8, 1024, 784)
theta, theta_shape_5d = model.Reshape(
theta, [theta + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else theta,
theta + '_shape5d'],
shape=(batch_size, dim_inner, -1))
phi, phi_shape_5d = model.Reshape(
phi, [phi + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else phi,
phi + '_shape5d'],
shape=(batch_size, dim_inner, -1))
g, g_shape_5d = model.Reshape(
g, [g + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else g,
g + '_shape5d'],
shape=(batch_size, dim_inner, -1))
# e.g., (8, 1024, 784) * (8, 1024, 784) => (8, 784, 784)
theta_phi = model.net.BatchMatMul([theta, phi], prefix + '_affinity', trans_a=1)
if cfg.NONLOCAL.USE_SOFTMAX is True:
if cfg.NONLOCAL.USE_SCALE is True:
theta_phi_sc = model.Scale(theta_phi, theta_phi, scale=dim_inner**-.5)
else:
theta_phi_sc = theta_phi
# softmax
# sum(p[i, j, :]) == 1, for any i, j
p = model.Softmax(theta_phi_sc, theta_phi + '_prob', engine='CUDNN', axis=2)
else:
ones = model.net.ConstantFill([theta_phi], [theta_phi + '_ones'], value=1.)
ones = model.net.ReduceBackSum([ones], [theta_phi + '_const'])
zeros = model.net.ConstantFill([theta_phi], [theta_phi + '_zeros'], value=0.)
denom = model.net.Add(
[zeros, ones], [theta_phi + '_denom'], broadcast=1, axis=0)
model.StopGradient(denom, denom)
p = model.net.Div([theta_phi, denom], [theta_phi + '_sc'])
# note: g's axis[2] corresponds to p's axis[2]
# e.g., g(8, 1024, 784_2) * p(8, 784_1, 784_2) => (8, 1024, 784_1)
t = model.net.BatchMatMul([g, p], prefix + '_y', trans_b=1)
# reshape back:
# e.g., (8, 1024, 784) => (8, 1024, 4, 14, 14)
t_re, t_shape = model.Reshape(
[t, theta_shape_5d],
[t + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else t,
t + '_shape3d'])
blob_out = t_re
blob_out = model.ConvNd(
blob_out, prefix + '_out',
dim_inner,
dim_out,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD})
if not cfg.NONLOCAL.USE_ZERO_INIT_CONV else
('ConstantFill', {'value': 0.}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
if cfg.NONLOCAL.USE_BN is True:
blob_out = model.SpatialBN(
blob_out, prefix + "_bn", dim_out,
epsilon=cfg.NONLOCAL.BN_EPSILON, momentum=cfg.NONLOCAL.BN_MOMENTUM,
is_test=is_test
)
model.param_init_net.ConstantFill(
[prefix + "_bn_s"], prefix + "_bn_s", value=cfg.NONLOCAL.BN_INIT_GAMMA)
if cfg.NONLOCAL.USE_AFFINE is True:
blob_out = model.AffineNd(blob_out, prefix + "_bn", dim_out)
return blob_out
In fact, it uses MatMul instead of conv op.
In fact, conv op is a MatMul in maths.in filePNon-Local_Nets-Tensorflow/ops.py: there are the implement of the NonLocalBlock: used conv and matmul ops too.it seems to be a embedded version. def NonLocalBlock(input_x, out_channels, sub_sample=True, is_bn=True, scope='NonLocalBlock'): batchsize, height, width, in_channels = input_x.get_shape().as_list() with tf.variable_scope(scope) as sc: with tf.variable_scope('g') as scope: g = slim.conv2d(input_x, out_channels, [1,1], stride=1, scope='g') if sub_sample: g = slim.max_pool2d(g, [2,2], stride=2, scope='g_max_pool')
with tf.variable_scope('phi') as scope:
phi = slim.conv2d(input_x, out_channels, [1,1], stride=1, scope='phi')
if sub_sample:
phi = slim.max_pool2d(phi, [2,2], stride=2, scope='phi_max_pool')
with tf.variable_scope('theta') as scope:
theta = slim.conv2d(input_x, out_channels, [1,1], stride=1, scope='theta')
g_x = tf.reshape(g, [batchsize,out_channels, -1])
g_x = tf.transpose(g_x, [0,2,1])
theta_x = tf.reshape(theta, [batchsize, out_channels, -1])
theta_x = tf.transpose(theta_x, [0,2,1])
phi_x = tf.reshape(phi, [batchsize, out_channels, -1])
f = tf.matmul(theta_x, phi_x)
# ???
f_softmax = tf.nn.softmax(f, -1)
y = tf.matmul(f_softmax, g_x)
y = tf.reshape(y, [batchsize, height, width, out_channels])
with tf.variable_scope('w') as scope:
w_y = slim.conv2d(y, in_channels, [1,1], stride=1, scope='w')
if is_bn:
w_y = slim.batch_norm(w_y)
z = input_x + w_y
return z
@675492062 I wonder why this implementation use g_x = tf.reshape(g, [batchsize,out_channels, -1]) g_x = tf.transpose(g_x, [0,2,1]) after computing g_x or theta_x. In my opinion, the output of slim.conv2d should be [batch_size, height, width, out_channels]. So, if we reshape the output to [batchsize,out_channels, -1] and transpose it, I think it may mass up the dimension of matrix. Why not just reshape to [batchsize,-1, out_channels]?
@YongyiTang92 Yes,I thank so.The code on hithub website is not guaranteed to be correct.Ha-ha!
@YongyiTang92 @675492062 Thank you for reminding, I will check the codes and revise them if it is necessary.
@RaoHaocheng Actually, matmul is another version of Non-local block.
@nnUyi Well, have you trained it on other datasets?
@675492062 I wonder why this implementation use
g_x = tf.reshape(g, [batchsize,out_channels, -1]) g_x = tf.transpose(g_x, [0,2,1])after computing g_x or theta_x. In my opinion, the output of slim.conv2d should be [batch_size, height, width, out_channels]. So, if we reshape the output to [batchsize,out_channels, -1] and transpose it, I think it may mass up the dimension of matrix. Why not just reshape to [batchsize,-1, out_channels]?
I have tried changing the reshape operation, however, no matter whether changing it or not, the non local model does not show better performance. I tried to delete the code "nonlocal_block1 = NonLocalBlock(cnv1_pool, 32, scope='nonlocal_block1')" and "nonlocal_block2 = NonLocalBlock(cnv2_pool, 64, scope='nonlocal_block2')" to disable non local model. To my surprise, the network without non local model yields better results. Have you experienced this? Thanks~
@675492062 I wonder why this implementation use
g_x = tf.reshape(g, [batchsize,out_channels, -1]) g_x = tf.transpose(g_x, [0,2,1])after computing g_x or theta_x. In my opinion, the output of slim.conv2d should be [batch_size, height, width, out_channels]. So, if we reshape the output to [batchsize,out_channels, -1] and transpose it, I think it may mass up the dimension of matrix. Why not just reshape to [batchsize,-1, out_channels]?I have tried changing the reshape operation, however, no matter whether changing it or not, the non local model does not show better performance. I tried to delete the code "nonlocal_block1 = NonLocalBlock(cnv1_pool, 32, scope='nonlocal_block1')" and "nonlocal_block2 = NonLocalBlock(cnv2_pool, 64, scope='nonlocal_block2')" to disable non local model. To my surprise, the network without non local model yields better results. Have you experienced this? Thanks~
Just have more try with regards to difference task. Of course, make sure your ideas and code are correct. According to my experiment, some tasks are improved a little, but they are very small, and others have not be improved all.