deepdetect Memory leak on compressed predict requests with oatpp

Configuration

Version of DeepDetect:
- [ ] Locally compiled on:
  - [ ] Ubuntu 18.04 LTS
  - [ ] Other:
- [ ] Docker CPU
- [X] Docker GPU
- [ ] Amazon AMI
Commit (shown by the server when starting): 23bd913ac180b56eddbf90c71d1f2e8bc2310c54

Your question / the problem you're facing:

When using the last versions of DeDe (0.18.0 and 0.17.0 at least) I have noticed that there was a memory leak (similar to https://github.com/jolibrain/deepdetect/issues/1260). I thought that it was fixed but using the following test it does not seem to be. Tests are made using a 1080Ti gpu fyi.

Error message (if any) / steps to reproduce the problem:

First I run a container using the following image

CALL

docker run --name dd-test --gpus device=0 -p 8080:8080 jolibrain/deepdetect_gpu_tensorrt:v0.18.0

LOG

=====================
== NVIDIA TensorRT ==
=====================

NVIDIA Release 21.04 (build 22393618)

NVIDIA TensorRT 7.2.3 (c) 2016-2021, NVIDIA CORPORATION.  All rights reserved.
Container image (c) 2021, NVIDIA CORPORATION.  All rights reserved.

https://developer.nvidia.com/tensorrt

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

DeepDetect v0.18.0-dirty (dev)
GIT REF: heads/v0.18.0:23bd913ac180b56eddbf90c71d1f2e8bc2310c54
COMPILE_FLAGS: USE_CAFFE2=OFF USE_TF=OFF USE_NCNN=OFF USE_TORCH=OFF USE_HDF5=ON USE_CAFFE=OFF USE_TENSORRT=ON USE_TENSORRT_OSS=OFF USE_DLIB=OFF USE_CUDA_CV=OFF USE_SIMSEARCH=OFF USE_ANNOY=OFF USE_FAISS=ON USE_COMMAND_LINE=ON USE_JSON_API=ON USE_HTTP_SERVER=OFF
DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080

Then I create a service using an nsfw model CALL

curl -X PUT http://localhost:8080/services/nsfw -d '{
   "description": "nsfw classification service",
   "model": {
    "repository": "/tmp/models/nsfw",
    "create_repository": true,
    "init":"https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz"
   },
   "mllib": "tensorrt",
   "type": "supervised",
   "parameters": {
    "input": {
     "connector": "image"
    }
   }
  }
  '

LOG

DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080
[2021-07-04 21:48:49.115] [api] [info] Downloading init model https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CoordConvAC version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResize version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResizeDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::DetectionLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::FlattenConcat_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GenerateDetection_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchor_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchorRect_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::InstanceNormalization_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::LReLU_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Normalize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PriorBox_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Proposal version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Region_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Reorg_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ResizeNearest_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::RPROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::SpecialSlice_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Split version 1
[2021-07-04 21:49:00.571] [nsfw] [info] trying to determine the input size...
[2021-07-04 21:49:00.585] [nsfw] [info] found 224x224 as input size
[2021-07-04 21:49:00.585] [api] [info] HTTP/1.1 "PUT /services/nsfw" <n/a> 201 11471ms

Then I launch many predictions with a fixed batche size using the script called dd_test.py that is pasted below CALL

import json
import sys
import random


# Get random data
def get_random_images(number_images=1000, height=600, width=600):
    images = ["https://picsum.photos/id/{}/{}/{}".format(x, height, width) for x in range(number_images)]

    return images

LISTEN_URL = "http://localhost"
LISTEN_PORT = "8080"

NUMBER_IMAGES = 1000  # Number of images to use

clf_post ={
      "service":"NAME",
      "parameters":{
        "output":{
          "best": 3
        },
        "mllib": {
            "gpu": True
        }
      },
      "data": []
    }


services = {'nsfw': {'bbox': False, 'size': 224}}

url_images = get_random_images(NUMBER_IMAGES)
print(services)

# Launch predictions
nb_run=10
for j in range(nb_run):
    for i in range(0, NUMBER_IMAGES, 6):
        data = url_images[i:i+6]
        for elem, val in services.items():
            clf_post["data"] = data
            clf_post["service"] = elem
            tmp = requests.post("{}:{}/predict".format(LISTEN_URL, LISTEN_PORT), data=json.dumps(clf_post))

LOG

....
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(Pooling): pool, Tactic: -1, eltwise_stage3_block2[Float(1024,7,7)] -> pool[Float(1024,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(CublasConvolution): fc_nsfw, Tactic: 0, pool[Float(1024,1,1)] -> fc_nsfw[Float(2,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(SoftMax): prob, Tactic: 1001, fc_nsfw[Float(2,1,1)] -> prob[Float(2,1,1)]
[2021-07-04 21:50:05.285] [nsfw] [info] Allocated persistent device memory of size 31235584
[2021-07-04 21:50:05.286] [nsfw] [info] Allocated activation device memory of size 272154624
[2021-07-04 21:50:05.286] [nsfw] [info] Assigning persistent memory blocks for various profiles
[2021-07-04 21:50:05.286] [nsfw] [info] detected output dimensions: [2, 1 1 0]
[2021-07-04 21:50:05.534] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 8386ms
[2021-07-04 21:50:05.716] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 177ms
[2021-07-04 21:50:05.895] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:06.079] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 181ms
[2021-07-04 21:50:06.302] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 218ms
[2021-07-04 21:50:06.505] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 198ms
[2021-07-04 21:50:06.714] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 205ms
[2021-07-04 21:50:06.894] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:07.086] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 189ms
[2021-07-04 21:50:07.273] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 183ms

Now if you check the evolution of the RAM used we observe an increase (1644Mo at the beginning to 2095Mo after 5 minutes after_5minutes_predictions first_predictions ).

Jul 04 '21 22:07 YaYaB

More info on that : we managed to reproduce this issues with the above Python scripts, but not with curl.

So after testing more with @YaYaB, we had a strong intuition that it has something to do with the HTTP serving. After analysing HTTP headers, we found that Python requests by default asks for GZIP encoded answer ( Accept-Encoding: gzip, deflate ) while curl doesn't. So we manually set this header in curl, and finally reproduced the issue with curl too.

We also tested to send gzip-compressed queries, asking for uncompressed responses, and no memory leak was noticed. So really looks like it's something related to GZIP compression.

Jul 05 '21 10:07 rguilmont

Actualy it is even not related to tensorrt but even with classical caffe predictions with or without gpu

Jul 05 '21 10:07 YaYaB

@rguilmont @YaYaB gzip/deflate encryption is handled by https://github.com/oatpp/oatpp-zlib from within https://github.com/oatpp/oatpp. The components are simply added here: https://github.com/jolibrain/deepdetect/blob/master/src/http/app_component.hpp#L114

Running valgrind on dede with gzip queries only shows the possible leak below. This looks like an init from libz directly, from the oatpp send function.

@lganzzzo Hi, the ::send function seems to leak from deflateInit, have you seen this before, or are we doing something wrong ? Thanks.

Libz init memory reported by valgrind:

==3020638== 536,192 (11,904 direct, 524,288 indirect) bytes in 2 blocks are definitely lost in loss record 4,799 of 4,801
==3020638==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3020638==    by 0x5AA3418: deflateInit2_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638==    by 0x5AA3651: deflateInit_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638==    by 0x71C063: oatpp::zlib::DeflateEncoder::DeflateEncoder(long, bool, int) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x71B2E8: oatpp::zlib::DeflateEncoderProvider::getProcessor() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6EA02D: oatpp::web::protocol::http::outgoing::Response::send(oatpp::data::stream::OutputStream*, oatpp::data::stream::BufferOutputStream*, oatpp::web::protocol::http::encoding::EncoderProvider*) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6F6DB6: oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x6FB28F: oatpp::web::server::HttpProcessor::Task::run() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638==    by 0x945BDE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3020638==    by 0x936B608: start_thread (pthread_create.c:477)
==3020638==    by 0x9837292: clone (clone.S:95)

Jul 06 '21 06:07 beniz

Hey @beniz ,

Your code looks good. Most probably it's on oatpp side. I'll take a closer look.

Jul 06 '21 14:07 lganzzzo

Hi @lganzzzo how are things ? Do you have any fresh lead on this by any chance ? I've seen issues with libz a long time ago, this could still be outside oatpp.

Jul 18 '21 09:07 beniz

Hey @beniz ,

Yes, at this point it looks like a libz issue. I'm filing an issue in oatpp to investigate possible fixes.

It might take a while

Jul 20 '21 12:07 lganzzzo

Thanks a lot guys.

FYI we've mitigated this gzip issue by setting an Envoy proxy in front of deepdetect, taking care of compression and decompression of requests.

Jul 21 '21 12:07 rguilmont

deepdetect deepdetect copied to clipboard

Memory leak on compressed predict requests with oatpp

Configuration

Your question / the problem you're facing:

Error message (if any) / steps to reproduce the problem:

deepdetect
deepdetect copied to clipboard