ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

ncnn uses much more memory than other inference frameworks.

Open cyrusbehr opened this issue 3 years ago • 5 comments

I recently ran a benchmark comparing the inference speed and memory usage of various inference frameworks, including ncnn, mxnet, onnxruntime, and openvino. For this benchmark, I used the Insightface arcface resnet100 model.

While ncnn inference speed on x86 is comparable to many of the other large frameworks (I was impressed by this, good work!), I saw that it uses significantly more memory to run inference than other frameworks.

For reference, ncnn used 1.7Gb of RAM to run this model, while the other frameworks used between 0.37Gb - 0.57Gb of RAM.

Is this a bug? Or does ncnn simply use more memory? You can find the full benchmarks here (benchmark numbers reported in readme).

cyrusbehr avatar Mar 11 '21 18:03 cyrusbehr

memory_bar

cyrusbehr avatar Mar 11 '21 18:03 cyrusbehr

opt.use_packing_layout = false , ncnn used 0.8Gb of RAM

ncnnnnn avatar Mar 18 '21 08:03 ncnnnnn

Yes that's true, although disabling that option also significantly slows down inference on machines supporting SIMD registers such as AVX2 registers. Ultimately I would like to see the AVX2 optimizations without the large memory overhead, but it does not look like that is available right now, perhaps a good goal to work towards.

cyrusbehr avatar Mar 18 '21 16:03 cyrusbehr

I'm also experiencing this issue. With a quantized int8 model of ~65MB, on both iOS and Android the memory used is about 500-600MB, 3X-4X than using TFLite or MLCore. Is there any workaround, also decreasing inference speed, to lower the memory usage? I tried the opt.use_packing_layout = false as suggested but it doesn't help at all, maybe because the model is quantized?

mtamburrano avatar Sep 16 '21 16:09 mtamburrano

i dont know is it to late to answer this issue,i met zhe same situation as you. it is beacuse that ncnn have some acceleration algorithm witch consum more momery,if you want to reduce the usage you can set opt.use_winograd_convolution as false. In my project,r101 just uses about 450mb

eguoguo321 avatar Aug 18 '22 07:08 eguoguo321