tf-faster-rcnn ResourceExhaustedError

my gpu is GTX 1060 , and the demo has been successfully run . but when try to train ,something has gone wrong. I used this command : ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 who can help me? thanks!

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[25088,4096] [[Node: gradients/vgg_16_2/fc6/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](vgg_16/fc6/weights/read, gradients/vgg_16_2/fc6/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency_1)]] [[Node: LOSS_default/add_5/_253 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1113_LOSS_default/add_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Jan 13 '19 05:01 suixin567

who can help me、、、

Jan 15 '19 09:01 suixin567

same problem here

Feb 02 '19 17:02 christianvari

You can reduce the memory usage of the model by reducing the size of some parameters.

I will mention you some parameters try changing one at a time or combinations. In the best case, you should minimize reducing the sizes as much as possible.

in lib/model/config.py file __C.TRAIN.SCALES = (600,) __C.TRAIN.MAX_SIZE = 1000

and in experiments/cfgs/vgg16.yml RPN_BATCHSIZE: 256 BATCH_SIZE: 256

Feb 02 '19 18:02 rnsandeep

Thank you but the problem remains. It appens during the loading of the weights from vgg16.ckpt

Il giorno sab 2 feb 2019, 19:48 Naga Sandeep Ramachandruni < [email protected]> ha scritto:

You can reduce the memory usage of the model by reducing the size of some parameters.

I will mention you some parameters try changing one at a time or combinations. In the best case, you should minimize reducing the sizes as much as possible.

in lib/model/config.py file __C.TRAIN.SCALES = (600,) __C.TRAIN.MAX_SIZE = 1000

and in experiments/cfgs/vgg16.yml RPN_BATCHSIZE: 256 BATCH_SIZE: 256

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/416#issuecomment-459989327, or mute the thread https://github.com/notifications/unsubscribe-auth/ASebNL4TQNLNI5diBB_3BxnwUkkn0ZO9ks5vJd1kgaJpZM4Z8_cF .

Feb 03 '19 08:02 christianvari

Obviously, it will be when loading of weights. Try with res50. The gpu you were using doesn't have enough memory to train these networks. What is the memory of your GPU?

Feb 03 '19 09:02 rnsandeep

I solve this problem by using res101 pre-weight replace vgg16, good luck!

Feb 21 '19 02:02 hanlaoshi

I solve this problem by rebooting the terminal.

May 10 '20 10:05 Zx07

tf-faster-rcnn tf-faster-rcnn copied to clipboard

ResourceExhaustedError

tf-faster-rcnn
tf-faster-rcnn copied to clipboard