tensorrt_demos
tensorrt_demos copied to clipboard
The mtcnn consumes takes alot of RAM.
2.35GB ram is being consumed. Any way to reduce it? Any reason why it is this much?
The TensorRT PNet engine takes the most memory. Please read my Optimizing TensorRT MTCNN blog post, as well as the comments in mtcnn/det1_relu.prototxt. Make sure you understand the design.
On way to reduce memory consumption is to trade off how large an input image you'd like to process is (or how small the faces you'd like to detect in the images are), versus memory consumption (and inference speed) of the TensorRT MTCNN code.
My design is for 1280x720 input images. If, for example, you reduce that to 640x360, memory consumption could be reduced to roughly 1/4. (But then only faces at least twice as large could be detected.)
How do i do the dimension calculation?
# Max allowed input image size as: 1280x720
# 'minsize' = 40
#
# Input dimension of the 1st 'scale':
# 720 * 12 / 40 = 216
# 1280 * 12 / 40 = 384
# 72,128
# H's in all scales: (scale factor = 0.709)
# Original: 216.0, 153.1, 108.6 77.0, 54.6, 38.7, 27.4, 19.5, 13.8, (9.8)
# Rounded: 216, 154, 108, 78, 54, 38, 28, 20, 14
# Offsets: 0, 216, 370, 478, 556, 610, 648, 676, 696, (710)
#
# Input dimension of the 'stacked image': 710x384
#
# Output dimension: (stride=2)
# (710 - 12) / 2 + 1 = 350
# (384 - 12) / 2 + 1 = 187
#
Am unable to understand what the above calculation mean. how did 710 come from 720 What would my Offset be for 640X360?
I tried few values but i am getting dimension error. I was able to speed up(66FPS-> 78FPS on RTX 3090) the inference with reducing max batches in RNET and ONET. But no improvement in reducing the Memory consumption.
Am unable to understand what the above calculation mean.
The calculation corresponds to stacking different scales of the original image from top to bottom. The "Offsets" are the y-axis values of (top-left corner of) all those scaled images.

Okay
I tried to replicate the same
#utils.mtcnn input_h_offsets = (0, 185, 205, 279, 307, 327, 341, 351, 358) output_h_offsets = (0, 92, 102, 139 , 153, 163, 170, 175, 179 )
The prototxt
input_param{shape:{dim:1 dim:3 dim:358 dim:108}}
Its running now at 120 FPS but memory consumption not reduced also the output is not coming.
No detection happening no face detected in simple images of my test set.
-
The PNet was designed with stride=2. So you need to round "h" & "w" numbers to even numbers.
-
In addition to modifying the input dimension of the TensorRT PNet engine, you'd also need to modify "utils/mtcnn.py" source code accordingly.
-
Input dimension of TensorRT PNet engine (for stacked multi-scale input image): https://github.com/jkjung-avt/tensorrt_demos/blob/9c5e68244b611f886e8906a811066cf3b548346e/mtcnn/det1_relu.prototxt#L26
-
The corresponding "utils/mtcnn.py" source code: https://github.com/jkjung-avt/tensorrt_demos/blob/9c5e68244b611f886e8906a811066cf3b548346e/utils/mtcnn.py#L228-L230
-
Ok thanks actually i did modify utils.mtcnn but not to nearest even number. Will do that then update you. Also I thing TRT inherintly takes 2.35 GB RAM (YOLOv4 and mtcnn from your repository I tried suggest the same). Any way to reduce that.??
Also I thing TRT inherently takes 2.35 GB RAM (YOLOv4 and mtcnn from your repository I tried suggest the same). Any way to reduce that?
Sorry, I don't have an idea.