deepdetect
deepdetect copied to clipboard
Memory leak on compressed predict requests with oatpp
Configuration
- Version of DeepDetect:
- [ ] Locally compiled on:
- [ ] Ubuntu 18.04 LTS
- [ ] Other:
- [ ] Docker CPU
- [X] Docker GPU
- [ ] Amazon AMI
- [ ] Locally compiled on:
- Commit (shown by the server when starting):
23bd913ac180b56eddbf90c71d1f2e8bc2310c54
Your question / the problem you're facing:
When using the last versions of DeDe (0.18.0 and 0.17.0 at least) I have noticed that there was a memory leak (similar to https://github.com/jolibrain/deepdetect/issues/1260). I thought that it was fixed but using the following test it does not seem to be. Tests are made using a 1080Ti gpu fyi.
Error message (if any) / steps to reproduce the problem:
First I run a container using the following image
CALL
docker run --name dd-test --gpus device=0 -p 8080:8080 jolibrain/deepdetect_gpu_tensorrt:v0.18.0
LOG
=====================
== NVIDIA TensorRT ==
=====================
NVIDIA Release 21.04 (build 22393618)
NVIDIA TensorRT 7.2.3 (c) 2016-2021, NVIDIA CORPORATION. All rights reserved.
Container image (c) 2021, NVIDIA CORPORATION. All rights reserved.
https://developer.nvidia.com/tensorrt
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh
To install the open-source samples corresponding to this TensorRT release version run /opt/tensorrt/install_opensource.sh.
To build the open source parsers, plugins, and samples for current top-of-tree on master or a different branch, run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.
DeepDetect v0.18.0-dirty (dev)
GIT REF: heads/v0.18.0:23bd913ac180b56eddbf90c71d1f2e8bc2310c54
COMPILE_FLAGS: USE_CAFFE2=OFF USE_TF=OFF USE_NCNN=OFF USE_TORCH=OFF USE_HDF5=ON USE_CAFFE=OFF USE_TENSORRT=ON USE_TENSORRT_OSS=OFF USE_DLIB=OFF USE_CUDA_CV=OFF USE_SIMSEARCH=OFF USE_ANNOY=OFF USE_FAISS=ON USE_COMMAND_LINE=ON USE_JSON_API=ON USE_HTTP_SERVER=OFF
DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080
Then I create a service using an nsfw model CALL
curl -X PUT http://localhost:8080/services/nsfw -d '{
"description": "nsfw classification service",
"model": {
"repository": "/tmp/models/nsfw",
"create_repository": true,
"init":"https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz"
},
"mllib": "tensorrt",
"type": "supervised",
"parameters": {
"input": {
"connector": "image"
}
}
}
'
LOG
DEPS_VERSION: OPENCV_VERSION=4.2.0 CUDA_VERSION=11.3 CUDNN_VERSION= TENSORRT_VERSION=21.04
[2021-07-04 21:47:20.374] [api] [info] DeepDetect HTTP server listening on 0.0.0.0:8080
[2021-07-04 21:48:49.115] [api] [info] Downloading init model https://deepdetect.com/models/init/desktop/images/classification/nsfw.tar.gz
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchTilePlugin_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CoordConvAC version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResize version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::CropAndResizeDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::DetectionLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::FlattenConcat_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GenerateDetection_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchor_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::GridAnchorRect_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::InstanceNormalization_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::LReLU_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::MultilevelProposeROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMS_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::NMSDynamic_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Normalize_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PriorBox_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalLayer_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Proposal version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ProposalDynamic version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Region_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Reorg_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::ResizeNearest_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::RPROI_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::SpecialSlice_TRT version 1
[2021-07-04 21:48:49.696] [nsfw] [info] Registered plugin creator - ::Split version 1
[2021-07-04 21:49:00.571] [nsfw] [info] trying to determine the input size...
[2021-07-04 21:49:00.585] [nsfw] [info] found 224x224 as input size
[2021-07-04 21:49:00.585] [api] [info] HTTP/1.1 "PUT /services/nsfw" <n/a> 201 11471ms
Then I launch many predictions with a fixed batche size using the script called dd_test.py
that is pasted below
CALL
import json
import sys
import random
# Get random data
def get_random_images(number_images=1000, height=600, width=600):
images = ["https://picsum.photos/id/{}/{}/{}".format(x, height, width) for x in range(number_images)]
return images
LISTEN_URL = "http://localhost"
LISTEN_PORT = "8080"
NUMBER_IMAGES = 1000 # Number of images to use
clf_post ={
"service":"NAME",
"parameters":{
"output":{
"best": 3
},
"mllib": {
"gpu": True
}
},
"data": []
}
services = {'nsfw': {'bbox': False, 'size': 224}}
url_images = get_random_images(NUMBER_IMAGES)
print(services)
# Launch predictions
nb_run=10
for j in range(nb_run):
for i in range(0, NUMBER_IMAGES, 6):
data = url_images[i:i+6]
for elem, val in services.items():
clf_post["data"] = data
clf_post["service"] = elem
tmp = requests.post("{}:{}/predict".format(LISTEN_URL, LISTEN_PORT), data=json.dumps(clf_post))
LOG
....
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(Pooling): pool, Tactic: -1, eltwise_stage3_block2[Float(1024,7,7)] -> pool[Float(1024,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(CublasConvolution): fc_nsfw, Tactic: 0, pool[Float(1024,1,1)] -> fc_nsfw[Float(2,1,1)]
[2021-07-04 21:50:05.144] [nsfw] [info] Layer(SoftMax): prob, Tactic: 1001, fc_nsfw[Float(2,1,1)] -> prob[Float(2,1,1)]
[2021-07-04 21:50:05.285] [nsfw] [info] Allocated persistent device memory of size 31235584
[2021-07-04 21:50:05.286] [nsfw] [info] Allocated activation device memory of size 272154624
[2021-07-04 21:50:05.286] [nsfw] [info] Assigning persistent memory blocks for various profiles
[2021-07-04 21:50:05.286] [nsfw] [info] detected output dimensions: [2, 1 1 0]
[2021-07-04 21:50:05.534] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 8386ms
[2021-07-04 21:50:05.716] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 177ms
[2021-07-04 21:50:05.895] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:06.079] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 181ms
[2021-07-04 21:50:06.302] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 218ms
[2021-07-04 21:50:06.505] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 198ms
[2021-07-04 21:50:06.714] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 205ms
[2021-07-04 21:50:06.894] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 176ms
[2021-07-04 21:50:07.086] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 189ms
[2021-07-04 21:50:07.273] [api] [info] HTTP/1.1 "POST /predict" nsfw 200 183ms
Now if you check the evolution of the RAM used we observe an increase (1644Mo at the beginning to 2095Mo after 5 minutes
).
More info on that : we managed to reproduce this issues with the above Python scripts, but not with curl.
So after testing more with @YaYaB, we had a strong intuition that it has something to do with the HTTP serving.
After analysing HTTP headers, we found that Python requests by default asks for GZIP encoded answer ( Accept-Encoding: gzip, deflate
) while curl doesn't.
So we manually set this header in curl, and finally reproduced the issue with curl too.
We also tested to send gzip-compressed queries, asking for uncompressed responses, and no memory leak was noticed. So really looks like it's something related to GZIP compression.
Actualy it is even not related to tensorrt but even with classical caffe predictions with or without gpu
@rguilmont @YaYaB gzip/deflate encryption is handled by https://github.com/oatpp/oatpp-zlib from within https://github.com/oatpp/oatpp. The components are simply added here: https://github.com/jolibrain/deepdetect/blob/master/src/http/app_component.hpp#L114
Running valgrind on dede
with gzip queries only shows the possible leak below. This looks like an init from libz directly, from the oatpp send
function.
@lganzzzo Hi, the ::send
function seems to leak from deflateInit
, have you seen this before, or are we doing something wrong ? Thanks.
Libz init memory reported by valgrind:
==3020638== 536,192 (11,904 direct, 524,288 indirect) bytes in 2 blocks are definitely lost in loss record 4,799 of 4,801
==3020638== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==3020638== by 0x5AA3418: deflateInit2_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638== by 0x5AA3651: deflateInit_ (in /lib/x86_64-linux-gnu/libz.so.1.2.11)
==3020638== by 0x71C063: oatpp::zlib::DeflateEncoder::DeflateEncoder(long, bool, int) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638== by 0x71B2E8: oatpp::zlib::DeflateEncoderProvider::getProcessor() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638== by 0x6EA02D: oatpp::web::protocol::http::outgoing::Response::send(oatpp::data::stream::OutputStream*, oatpp::data::stream::BufferOutputStream*, oatpp::web::protocol::http::encoding::EncoderProvider*) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638== by 0x6F6DB6: oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638== by 0x6FB28F: oatpp::web::server::HttpProcessor::Task::run() (in /home/beniz/projects/deepdetect/dev/deepdetect/build/main/dede)
==3020638== by 0x945BDE3: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==3020638== by 0x936B608: start_thread (pthread_create.c:477)
==3020638== by 0x9837292: clone (clone.S:95)
Hey @beniz ,
Your code looks good. Most probably it's on oatpp side. I'll take a closer look.
Hi @lganzzzo how are things ? Do you have any fresh lead on this by any chance ? I've seen issues with libz a long time ago, this could still be outside oatpp.
Hey @beniz ,
Yes, at this point it looks like a libz
issue.
I'm filing an issue in oatpp to investigate possible fixes.
It might take a while
Thanks a lot guys.
FYI we've mitigated this gzip issue by setting an Envoy proxy in front of deepdetect, taking care of compression and decompression of requests.