bonnetal icon indicating copy to clipboard operation
bonnetal copied to clipboard

Inference uses too much GPU memory

Open ivannson opened this issue 4 years ago • 3 comments

Hi,

I have trained a model on my own data, and now trying to run inference. When running infer_img.py on one image (640x480) I see my GPU (GeForce GTX 1080) usage jump up to 7.5gb, which seems excessive for one image only. Is this expected behaviour?

Do you have any suggestions on how to decrease GPU usage during inference, as I only have 8gb of GPU memory, and need it to run a simulation as well as a couple of other inference scripts at the same time, as the data is coming in.

I would also just like to say thanks for open sourcing your work. From what I've seen, this is one of the best detection/segmentation projects out there in terms of code quality and readability, as well as good explanations how to get everything working.

ivannson avatar Feb 07 '20 10:02 ivannson

Hi,

Thanks for the props :) Is it peak gpu consumption at the beginning or all the time during inference? Which model are you using?

tano297 avatar Feb 10 '20 09:02 tano297

Hi,

I was using my own dataset to train a segmentation model based on the coco config file (mobilenetsV2). I then ran infer_img.py script to get predictions for one image (for some reason it wasn't working when I pointed it to a folder with images. The GPU usage increased very quickly to 7gb, and went down to 0 since the inference finished by then.

I have also trained another model on a similar dataset but using cityscapes config file (ERFNet). Running inference with this model only used around 2.7gb of GPU memory.

I was working on adapting the inference script to be able to infer multiple images, and when I run that, the GPU usage is a lot lower, I think it was around 1.5gb. Not sure why that happens.

ivannson avatar Feb 14 '20 15:02 ivannson

Hi,

It may be that at the beginning of the inference cudnn is trying lots of different strategies and some of them use a lot of memory.

Another possibility (which I have fixed in our internal version), is this line.

You may want to change it to:

with torch.no_grad():
  _, skips = self.backbone(stub)

This line is there to profile the backbone and see the size of the skip connections. It allows me to adapt the decoder to any type of encoder, making backbone and decoder design significantly easier. However, without the no_grad guard, the run of that call also saves every activation as if you needed it for training, which is not desired. Give this a shot and let me know if it still spikes, this is likely the culprit.

BTW, the infer_img.py script already can take multiple images.

For example:

./infer_image.py -i /path/to/all/images/*.png

would infer all images in that directory.

As a final comment, I suggest you give the tensorRT runtime possibility a try. It will make your model WAY faster.

tano297 avatar Feb 17 '20 09:02 tano297