FastMaskRCNN
FastMaskRCNN copied to clipboard
Hello ,everyone !!!Now,python train/train.py meet this issue:ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,256,160,551] Is my NIVIDA out of memory? GPU device:GeForce GTX 1050 Ti 4.0G
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,256,160,551] [[Node: pyramid_1/P2/rpn/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pyramid/C2/fusion/BiasAdd, pyramid/P2/rpn/weights/read)]] [[Node: pyramid_1/AssignGTBoxes/Equal_5/_1175 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9290_pyramid_1/AssignGTBoxes/Equal_5", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/cpu:0"]]
I think it is out of memory.
Yes, OOM means out of memory. You have a huge number of channels (which I guess you can't change). Reduce the height and width of the input image and try again (use something like 1x48x48xc and gradually increase it to see what your card's limits are).
8GB are sometimes not enough for segmentation, let alone 4.
You will need atleast 10GB for the current setting of image heights and width, if cant arrange this much you have to reduce image dimensions.
From the original paper https://arxiv.org/pdf/1703.06870.pdf
Our models can run at about 200ms per frame on a GPU, and training on COCO takes one to two days on a single 8-GPU machine
This model runs at 195ms per image on an Nvidia Tesla M40 GPU
Assuming they used the same GPU for inference as for training, the Nvidia Tesla M40 GPU provides 12 GB , they had 96 Gb in total.
However, I guess that the memory Is not split up among computations, so a single GPU with 12 GB should be enough (but this is just a guess)
@kevinkit @blitu12345 @PavlosMelissinos
Thanks a lot!
@kevinkit can I reduce the mount of images to resolve that problems?
It won't make any difference.
The problem is that even a single batch won't fit on your gpu. For that reason you need to reduce at least one of your input's shape values: [1,256,160,551]
batch_size is already 1, so it can't be reduced.
You need to reduce either the shape of your input image or the number of channels. Of those numbers, 551 is the weirdest one. What kind of dataset has 551 classes anyway?
My gpu has 6 G memory, and I resize the height and width to 1/2, and it runs well by now
@anthony123 I got the same issue with 6G memory, did you finally get the codes work?How did you do this resize?