tf-faster-rcnn icon indicating copy to clipboard operation
tf-faster-rcnn copied to clipboard

ResourceExhaustedError

Open suixin567 opened this issue 5 years ago • 7 comments

my gpu is GTX 1060 , and the demo has been successfully run . but when try to train ,something has gone wrong. I used this command : ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 who can help me? thanks!

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[25088,4096] [[Node: gradients/vgg_16_2/fc6/kernel/Regularizer/l2_regularizer/L2Loss_grad/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](vgg_16/fc6/weights/read, gradients/vgg_16_2/fc6/kernel/Regularizer/l2_regularizer_grad/tuple/control_dependency_1)]] [[Node: LOSS_default/add_5/_253 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1113_LOSS_default/add_5", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

suixin567 avatar Jan 13 '19 05:01 suixin567

who can help me、、、

suixin567 avatar Jan 15 '19 09:01 suixin567

same problem here

christianvari avatar Feb 02 '19 17:02 christianvari

You can reduce the memory usage of the model by reducing the size of some parameters.

I will mention you some parameters try changing one at a time or combinations. In the best case, you should minimize reducing the sizes as much as possible.

in lib/model/config.py file __C.TRAIN.SCALES = (600,) __C.TRAIN.MAX_SIZE = 1000

and in experiments/cfgs/vgg16.yml RPN_BATCHSIZE: 256 BATCH_SIZE: 256

rnsandeep avatar Feb 02 '19 18:02 rnsandeep

Thank you but the problem remains. It appens during the loading of the weights from vgg16.ckpt

Il giorno sab 2 feb 2019, 19:48 Naga Sandeep Ramachandruni < [email protected]> ha scritto:

You can reduce the memory usage of the model by reducing the size of some parameters.

I will mention you some parameters try changing one at a time or combinations. In the best case, you should minimize reducing the sizes as much as possible.

in lib/model/config.py file __C.TRAIN.SCALES = (600,) __C.TRAIN.MAX_SIZE = 1000

and in experiments/cfgs/vgg16.yml RPN_BATCHSIZE: 256 BATCH_SIZE: 256

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/endernewton/tf-faster-rcnn/issues/416#issuecomment-459989327, or mute the thread https://github.com/notifications/unsubscribe-auth/ASebNL4TQNLNI5diBB_3BxnwUkkn0ZO9ks5vJd1kgaJpZM4Z8_cF .

christianvari avatar Feb 03 '19 08:02 christianvari

Obviously, it will be when loading of weights. Try with res50. The gpu you were using doesn't have enough memory to train these networks. What is the memory of your GPU?

rnsandeep avatar Feb 03 '19 09:02 rnsandeep

I solve this problem by using res101 pre-weight replace vgg16, good luck!

hanlaoshi avatar Feb 21 '19 02:02 hanlaoshi

I solve this problem by rebooting the terminal.

Zx07 avatar May 10 '20 10:05 Zx07