graph-rcnn.pytorch icon indicating copy to clipboard operation
graph-rcnn.pytorch copied to clipboard

Train this model... but OOM

Open jgyy4775 opened this issue 5 years ago • 11 comments

I want to train this model. But I encounter out of memory. I use GTX Titan X.

I think that exist memory leak...

Which GPU is right for this model?

jgyy4775 avatar Aug 26 '19 15:08 jgyy4775

can you post your command?

jwyang avatar Aug 27 '19 02:08 jwyang

@jwyang I use this command "python main.py --config-file configs/sgg_res101_step.yaml"

jgyy4775 avatar Aug 27 '19 03:08 jgyy4775

You need to reduce the bacth size. The default is 8 for 8 gpus.

jwyang avatar Aug 27 '19 03:08 jwyang

A single cpu can usually hold 1 or 2 images.

jwyang avatar Aug 27 '19 03:08 jwyang

@jwyang I reduce batch size, learning rate and image size. but result is "out of memory".

jgyy4775 avatar Aug 27 '19 03:08 jgyy4775

even batch_size=1? can you show me the output in your terminal?

jwyang avatar Aug 27 '19 03:08 jwyang

yes, batch size is "1". image

jgyy4775 avatar Aug 27 '19 03:08 jgyy4775

when does this happen? at the beginning or after training?

jwyang avatar Aug 27 '19 03:08 jwyang

beginning..

jgyy4775 avatar Aug 27 '19 03:08 jgyy4775

When training, my batch_size is 2, no OOM, but when inference, the result is "out of memory" after 40/26446...

simonJJJ avatar Sep 07 '19 10:09 simonJJJ

So this means if you don't have a GPU with at least 10GB RAM, you can't run this model at all? That's a shame..

dreichCSL avatar Jan 08 '20 16:01 dreichCSL