residual-attention-network
residual-attention-network copied to clipboard
GPU memory
You mentioned in the paper that the batch size for training is 32. However for my GPU which has 8G I set the batch size to 8 and almost all the memory is used.
So I am wondering how much memory your GPU has?
Thanks.
Hi What kind of network do you train? (attention-56 or attention-92?)
@fwang91 attention-56.
Thx.
BTW, do I have to init the BN layer like this? I am trying to write the train_val.prototxt, but the loss does not decrease and it keeps stable.
layer{
name: "post_res_4_3/branch1/conv2_3x3/bn"
type: "BN"
bottom: "post_res_4_3/branch1/conv2_3x3"
top: "post_res_4_3/branch1/conv2_3x3/bn"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
bn_param {
frozen: true
slope_filler {
type: "xavier"
std: 0.1
}
bias_filler {
type: "constant"
value: 0.2
}
}
}
We train the attention-56 with memory optimization, and the usage of memory is about 6G with 32 images per card.
I think you bn setting is not suitable for classification, the mean and variance lr must be zero.
The lr_mult and decay_multi of scale and shift is 1. And during the training stage, I do not froze the bn param.
@fwang91 In your code
// If true, will use the moving average mean and std for training and test.
and REMDME
We use moving average in the training stage
This means I should set BN's frozen to true to use "moving average" in the training stage.
So you set BN's frozen to false when training, right? Also could please give me any hints/papers/code on how to optimize the memory usage that you mentioned? Thank you very much.
@melody-rain 我也出现了loss不下降的情况,请问你有解决吗?谢谢!
I could train Attention-56 on ImageNet with batch size 50 on NVIDIA GTX 1080Ti with 8GB of memory. However, I reimplemented the network in Tensorflow.
@qinxianyuzi 请问一下你解决了么
@wjzh1 我把该网络的bn层改成了原始的bn层,并添加了scale层,网络可以收敛,但效果没有达到作者所说那样好。--可能我这样改是不正确的。
@qinxianyuzi 我想问下你的caffe环境是在cuda8.0下编译的么?我在cuda8.0+cudnnv5的环境下面报出1 error detected in the compilation of "/tmp/tmpxft_00002780_00000000-5_interp.cpp4.ii这个错误
@Muzijiajian 我没有使用作者提供的caffe
@melody-rain hi, i meet the same problem. The loss does not decrease. Do you solver this problem now?
@qinxianyuzi 请问一下你解决了么
@zallmight No I did not. I gave up...
I could train Attention-56 on ImageNet with batch size 50 on NVIDIA GTX 1080Ti with 8GB of memory. However, I reimplemented the network in Tensorflow.
would you mind sharing your trained code in tensorflow?thanks.