residual-attention-network GPU memory

You mentioned in the paper that the batch size for training is 32. However for my GPU which has 8G I set the batch size to 8 and almost all the memory is used.

So I am wondering how much memory your GPU has?

Thanks.

Aug 31 '17 03:08 melody-rain

Hi What kind of network do you train? (attention-56 or attention-92?)

Aug 31 '17 04:08 fwang91

@fwang91 attention-56.

Thx.

Aug 31 '17 05:08 melody-rain

BTW, do I have to init the BN layer like this? I am trying to write the train_val.prototxt, but the loss does not decrease and it keeps stable.

layer{
	name: "post_res_4_3/branch1/conv2_3x3/bn"
	type: "BN"
	bottom: "post_res_4_3/branch1/conv2_3x3"
	top: "post_res_4_3/branch1/conv2_3x3/bn"
	param {
            lr_mult: 1
            decay_mult: 1
        }
	param {
            lr_mult: 2
            decay_mult: 0
        }
	bn_param {
		frozen: true
		slope_filler {
                    type: "xavier"
                    std: 0.1
		}
		bias_filler {
                    type: "constant"
                    value: 0.2
                }
	}
}

Aug 31 '17 05:08 melody-rain

We train the attention-56 with memory optimization, and the usage of memory is about 6G with 32 images per card.
I think you bn setting is not suitable for classification, the mean and variance lr must be zero. The lr_mult and decay_multi of scale and shift is 1. And during the training stage, I do not froze the bn param.

Aug 31 '17 06:08 fwang91

@fwang91 In your code

// If true, will use the moving average mean and std for training and test.

and REMDME

 We use moving average in the training stage

This means I should set BN's frozen to true to use "moving average" in the training stage.

So you set BN's frozen to false when training, right? Also could please give me any hints/papers/code on how to optimize the memory usage that you mentioned? Thank you very much.

Aug 31 '17 07:08 melody-rain

@melody-rain 我也出现了loss不下降的情况，请问你有解决吗？谢谢！

Dec 04 '17 10:12 qinxianyuzi

I could train Attention-56 on ImageNet with batch size 50 on NVIDIA GTX 1080Ti with 8GB of memory. However, I reimplemented the network in Tensorflow.

Dec 04 '17 15:12 ondrejbiza

@qinxianyuzi 请问一下你解决了么

Mar 16 '18 03:03 wjzh1

@wjzh1 我把该网络的bn层改成了原始的bn层，并添加了scale层，网络可以收敛，但效果没有达到作者所说那样好。--可能我这样改是不正确的。

Mar 16 '18 06:03 qinxianyuzi

@qinxianyuzi 我想问下你的caffe环境是在cuda8.0下编译的么？我在cuda8.0+cudnnv5的环境下面报出1 error detected in the compilation of "/tmp/tmpxft_00002780_00000000-5_interp.cpp4.ii这个错误

Apr 03 '18 02:04 Muzijiajian

@Muzijiajian 我没有使用作者提供的caffe

Apr 09 '18 09:04 qinxianyuzi

@melody-rain hi, i meet the same problem. The loss does not decrease. Do you solver this problem now?

Apr 25 '18 13:04 zallmight

@qinxianyuzi 请问一下你解决了么

Apr 25 '18 13:04 zallmight

@zallmight No I did not. I gave up...

Apr 26 '18 01:04 melody-rain

I could train Attention-56 on ImageNet with batch size 50 on NVIDIA GTX 1080Ti with 8GB of memory. However, I reimplemented the network in Tensorflow.

would you mind sharing your trained code in tensorflow?thanks.

Nov 22 '18 02:11 Albert337

residual-attention-network residual-attention-network copied to clipboard

GPU memory

residual-attention-network
residual-attention-network copied to clipboard