CV-CUDA
CV-CUDA copied to clipboard
[BUG] GPU memory usage gradually increases
Describe the bug
When processing the same image repeatedly, the GPU memory usage is stable at around 1GB. However, after switching to different images, the memory usage increases to 4GB+ over time, and then fluctuates within a small range. PS:image from url
Steps/Code to reproduce bug ` code start
# code start
batch_size = 1
max_cpu_threads = 2
device_id = 0
target_img_height = 768
target_img_width = 576
#read img files
decoder = nvimgcodec.Decoder()
url = 'https://abcd/18e539ff-6fd3-451c-976a-5142a8eba362.jpg'
urls = [ url for _ in range(1000000000)]
pic_list = get_all_url_list_from_excel('/home/xxx/Code/0415-0421.xlsx', 'Sheet2')
# Define the cuda device, context and streams.
cuda_device = cuda.Device(device_id)
cuda_ctx = cuda_device.retain_primary_context()
# Use the the default stream for cvcuda and torch
# Since we never created a stream current will be the CUDA default stream
cvcuda_stream = cvcuda.Stream().current
torch_stream = torch.cuda.default_stream(device=cuda_device)
with cvcuda_stream, torch.cuda.stream(torch_stream):
for url in pic_list:
tensor_res = []
cuda_ctx.push()
t00_gpu=cv2.getTickCount()
response = requests.get(url)
t01_gpu=cv2.getTickCount()
#decode img using gpu
t0_gpu=cv2.getTickCount()
# img = np.frombuffer(response.content, dtype=np.uint8)
inputImage = decoder.decode(response.content)
t1_gpu=cv2.getTickCount()
# nvcvInputTensor1 = cvcuda.as_tensor(inputImage, "HWC")
# nvcvInputTensor1 = cvcuda.stack([nvcvInputTensor1])
# encoder = nvimgcodec.Encoder()
if inputImage.width > inputImage.height:
cp_img = cp.asarray(inputImage)
cp_img_rotated = cupyx.scipy.ndimage.rotate(cp_img, 270)
inputImage = nvimgcodec.as_image(cp_img_rotated)
tb0_gpu=cv2.getTickCount()
blur_degree = compute_img_blur_degree(inputImage)
tb1_gpu=cv2.getTickCount()
print(f'blur degree compute time: {(tb1_gpu-tb0_gpu)/cv2.getTickFrequency()*1000:.4f}ms')
# convert to nvcv tensor
t2_gpu=cv2.getTickCount()
nvcvInputTensor = cvcuda.as_tensor(inputImage, "HWC")
#Need 4 dimensions when first is batch size
nvcvInputTensor = cvcuda.stack([nvcvInputTensor])
t3_gpu=cv2.getTickCount()
t4_gpu=cv2.getTickCount()
# Resize to the input network dimensions
cvcuda_resize_tensor = cvcuda.resize(
nvcvInputTensor,
(
batch_size,
target_img_height,
target_img_width,
3,
),
# cvcuda.Format.RGB8,
cvcuda.Interp.AREA,
)
t5_gpu=cv2.getTickCount()
torch_preprocessed_tensor = torch.as_tensor(
cvcuda_resize_tensor.cuda(), device="cuda"
)
im_resize_gpu=torch_preprocessed_tensor.cpu().numpy()[0,:,:,::-1]#rgb2bgr
t6_gpu=cv2.getTickCount()
# cvcuda_resize_tensor = cvcuda.resize(
# nvcvInputTensor,
# (
# batch_size,
# target_img_height,
# target_img_width,
# 3,
# ),
# cvcuda.Interp.AREA,
# )
# Convert to the data type and range of values needed by the input layer
# i.e uint8->float. A Scale is applied to normalize the values in the range 0-1
nvcvConvertTensor = cvcuda.convertto(cvcuda_resize_tensor, np.float32, scale=1 / 255)
"""
The input to the network needs to be normalized based on the mean and
std deviation value to standardize the input data.
"""
# Create a torch tensor to store the mean and standard deviation values for R,G,B
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
scaleTensor = torch.Tensor(mean)
stdTensor = torch.Tensor(std)
# Reshape the the number of channels. The R,G,B values scale and offset will be
# applied to every color plane respectively across the batch
meanTensor = torch.reshape(scaleTensor, (1, 1, 1, 3)).cuda()
stdTensor = torch.reshape(stdTensor, (1, 1, 1, 3)).cuda()
# Wrap the torch tensor in a CV-CUDA Tensor
nvcvMeanTensor = cvcuda.as_tensor(meanTensor, "NHWC")
nvcvBaseTensor = cvcuda.as_tensor(stdTensor, "NHWC")
# Apply the normalize operator and indicate the scale values are std deviation
# i.e scale = 1/stddev
nvcvNormTensor = cvcuda.normalize(nvcvConvertTensor,
nvcvMeanTensor, nvcvBaseTensor, cvcuda.NormalizeFlags.SCALE_IS_STDDEV
)
# The final stage in the preprocess pipeline includes converting the RGB buffer
# into a planar buffer
nvcvPreprocessedTensor = cvcuda.reformat(nvcvNormTensor, "NCHW")
torch_preprocessed_tensor = torch.as_tensor(
nvcvPreprocessedTensor.cuda(), device="cuda"
)
tensor_res.append(torch_preprocessed_tensor)
t7_gpu=cv2.getTickCount()
`
Expected behavior No matter what kind of picture it is, the video memory usage should be a stable value
Environment overview (please complete the following information) Environmental information
- sys.platform: linux
- Python: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0]
- CUDA available: True
- MUSA available: False
- numpy_random_seed: 2147483648
- GPU 0: NVIDIA GeForce RTX 3070 Ti
- CUDA_HOME: /usr/local/cuda
- NVCC: Cuda compilation tools, release 11.6, V11.6.124
- GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
- PyTorch: 1.12.1
- PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.6
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.8.1 (built against CUDA 11.8)
- Built with CuDNN 8.3.2
Environment details
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 1
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 183
Model name: 13th Gen Intel(R) Core(TM) i7-13700KF
Stepping: 1
CPU MHz: 3400.000
CPU max MHz: 5400.0000
CPU min MHz: 800.0000
BogoMIPS: 6835.20
Virtualization: VT-x
L1d cache: 384 KiB
L1i cache: 256 KiB
L2 cache: 16 MiB
NUMA node0 CPU(s): 0-23
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr flush_l1d arch_capabilities
***CMake***
/usr/bin/cmake
cmake version 3.28.5
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
***Python***
/home/xxx/.conda/envs/jx_torch/bin/python
Python 3.10.14
***Environment Variables***
PATH : /home/xxx/.vscode-server/cli/servers/Stable-dc96b837cf6bb4af9cd736aa3af08cf8279f7685/server/bin/remote-cli:/home/xxx/.local/bin:/home/xxx/.conda/envs/jx_torch/bin:/home/zzhuaner/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin:/usr/local/TensorRT/bin:/home/xxx/.vscode-server/cli/servers/Stable-dc96b837cf6bb4af9cd736aa3af08cf8279f7685/server/bin/remote-cli:/home/xxx/.local/bin:/home/zzhuaner/anaconda3/bin:/home/zzhuaner/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin:/usr/local/TensorRT/bin:/usr/local/cuda/bin:/usr/local/TensorRT/bin:/usr/local/cuda/bin:/usr/local/TensorRT/bin
LD_LIBRARY_PATH : :/usr/local/cuda/lib64:/usr/local/TensorRT/lib:/home/xxx/Packages/onnxruntime-linux-x64-1.12.1/lib:/home/xxx/Packages/libtorch/lib:/usr/lib/x86_64-linux-gnu/:/usr/local/cuda/lib64:/usr/local/TensorRT/lib:/home/xxx/Packages/onnxruntime-linux-x64-1.12.1/lib:/home/xxx/Packages/libtorch/lib:/usr/lib/x86_64-linux-gnu/:/usr/local/cuda/lib64:/usr/local/TensorRT/lib:/home/xxx/Packages/onnxruntime-linux-x64-1.12.1/lib:/home/xxx/Packages/libtorch/lib:/usr/lib/x86_64-linux-gnu/
NUMBAPRO_NVVM :
NUMBAPRO_LIBDEVICE :
CONDA_PREFIX : /home/xxx/.conda/envs/jx_torch
PYTHON_PATH :
***conda packages***
conda is /home/zzhuaner/anaconda3/condabin/conda
/home/zzhuaner/anaconda3/condabin/conda
# packages in environment at /home/xxx/.conda/envs/jx_torch:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
addict 2.4.0 pypi_0 pypi
aenum 3.1.15 pypi_0 pypi
aliyun-python-sdk-core 2.15.1 pypi_0 pypi
aliyun-python-sdk-kms 2.16.3 pypi_0 pypi
appdirs 1.4.4 pypi_0 pypi
asynctest 0.13.0 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
blas 1.0 mkl conda-forge
brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge
bzip2 1.0.8 h5eee18b_6 defaults
ca-certificates 2024.3.11 h06a4308_0 defaults
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
cfgv 3.4.0 pypi_0 pypi
charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
contourpy 1.2.1 pypi_0 pypi
coverage 7.5.1 pypi_0 pypi
crcmod 1.7 pypi_0 pypi
cryptography 42.0.7 pypi_0 pypi
cudatoolkit 11.6.2 hfc3e2af_13 conda-forge
cupy 13.1.0 pypi_0 pypi
cvcuda-cu11 0.7.0b0 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
distlib 0.3.8 pypi_0 pypi
et-xmlfile 1.1.0 pypi_0 pypi
exceptiongroup 1.2.1 pypi_0 pypi
fastrlock 0.8.2 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.14.0 pypi_0 pypi
flake8 7.0.0 pypi_0 pypi
flatbuffers 24.3.25 pypi_0 pypi
fonttools 4.51.0 pypi_0 pypi
freetype 2.10.4 h0708190_1 conda-forge
fsspec 2024.3.1 pypi_0 pypi
gitdb 4.0.11 pypi_0 pypi
gitpython 3.1.43 pypi_0 pypi
gmp 6.3.0 h59595ed_1 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
grpcio 1.63.0 pypi_0 pypi
h5py 3.11.0 pypi_0 pypi
huggingface-hub 0.23.0 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
icu 73.2 h59595ed_0 conda-forge
identify 2.5.36 pypi_0 pypi
idna 3.7 pyhd8ed1ab_0 conda-forge
importlib-metadata 7.1.0 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306 defaults
interrogate 1.7.0 pypi_0 pypi
isort 4.3.21 pypi_0 pypi
jmespath 0.10.0 pypi_0 pypi
jpeg 9e h0b41bf4_3 conda-forge
kiwisolver 1.4.5 pypi_0 pypi
lame 3.100 h166bdaf_1003 conda-forge
lcms2 2.15 hfd0df8a_0 conda-forge
ld_impl_linux-64 2.38 h1181459_1 defaults
lerc 3.0 h9c3ff4c_0 conda-forge
libblas 3.9.0 1_h86c2bf4_netlib conda-forge
libcblas 3.9.0 5_h92ddd45_netlib conda-forge
libdeflate 1.17 h0b41bf4_0 conda-forge
libffi 3.4.4 h6a678d5_1 defaults
libgcc-ng 13.2.0 h77fa898_7 conda-forge
libgfortran-ng 13.2.0 h69a702a_7 conda-forge
libgfortran5 13.2.0 hca663fb_7 conda-forge
libgomp 13.2.0 h77fa898_7 conda-forge
libhwloc 2.9.1 hd6dc26d_0 conda-forge
libiconv 1.17 hd590300_2 conda-forge
liblapack 3.9.0 5_h92ddd45_netlib conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge
libtiff 4.5.1 h6a678d5_0 defaults
libuuid 1.41.5 h5eee18b_0 defaults
libwebp-base 1.4.0 hd590300_0 conda-forge
libxml2 2.10.4 hfdd30dd_2 defaults
lz4-c 1.9.4 hcb278e6_0 conda-forge
mako 1.3.3 pypi_0 pypi
markdown 3.6 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
matplotlib 3.8.4 pypi_0 pypi
mccabe 0.7.0 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46344 defaults
mmcv 2.2.0 pypi_0 pypi
mmdeploy 1.3.1 dev_0 <develop>
mmdeploy-runtime-gpu 1.3.1 pypi_0 pypi
mmdet 3.3.0 dev_0 <develop>
mmengine 0.10.4 pypi_0 pypi
model-index 0.1.11 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
nettle 3.6 he412f7d_0 conda-forge
nodeenv 1.8.0 pypi_0 pypi
numpy 1.26.4 py310hb13e2d6_0 conda-forge
nvidia-nvimgcodec-cu11 0.2.0.7 pypi_0 pypi
nvidia-nvjpeg-cu11 11.9.0.86 pypi_0 pypi
nvtx 0.2.10 pypi_0 pypi
onnx 1.16.0 pypi_0 pypi
onnxruntime 1.12.1 pypi_0 pypi
opencv-python 4.9.0.80 pypi_0 pypi
opendatalab 0.0.10 pypi_0 pypi
openh264 2.1.1 h780b84a_0 conda-forge
openjpeg 2.4.0 h3ad879b_0 defaults
openmim 0.3.9 pypi_0 pypi
openpyxl 3.1.0 pypi_0 pypi
openssl 3.3.0 hd590300_0 conda-forge
openxlab 0.0.38 pypi_0 pypi
ordered-set 4.1.0 pypi_0 pypi
oss2 2.17.0 pypi_0 pypi
packaging 24.0 pypi_0 pypi
pandas 2.2.2 pypi_0 pypi
pillow 10.3.0 py310h5eee18b_0 defaults
pip 24.0 py310h06a4308_0 defaults
platformdirs 4.2.1 pypi_0 pypi
pluggy 1.5.0 pypi_0 pypi
pre-commit 3.7.0 pypi_0 pypi
prettytable 3.10.0 pypi_0 pypi
protobuf 3.20.2 pypi_0 pypi
psutil 5.9.8 pypi_0 pypi
py 1.11.0 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pycocotools 2.0.7 pypi_0 pypi
pycodestyle 2.11.1 pypi_0 pypi
pycparser 2.22 pypi_0 pypi
pycryptodome 3.20.0 pypi_0 pypi
pycuda 2024.1 pypi_0 pypi
pyflakes 3.2.0 pypi_0 pypi
pygments 2.18.0 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
pysocks 1.7.1 pyha2e5f31_6 conda-forge
pytest 8.2.0 pypi_0 pypi
python 3.10.14 h955ad1f_1 defaults
python-dateutil 2.9.0.post0 pypi_0 pypi
python_abi 3.10 2_cp310 conda-forge
pytools 2024.1.2 pypi_0 pypi
pytorch 1.12.1 py3.10_cuda11.6_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2023.4 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
requests 2.28.2 pypi_0 pypi
rich 13.4.2 pypi_0 pypi
safetensors 0.4.3 pypi_0 pypi
scipy 1.13.0 pypi_0 pypi
seaborn 0.13.2 pypi_0 pypi
setuptools 60.2.0 pypi_0 pypi
shapely 2.0.4 pypi_0 pypi
six 1.16.0 pypi_0 pypi
smmap 5.0.1 pypi_0 pypi
sqlite 3.45.3 h5eee18b_0 defaults
sympy 1.12 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
tbb 2021.9.0 hf52228f_0 conda-forge
tensorrt 8.6.1 pypi_0 pypi
termcolor 2.4.0 pypi_0 pypi
terminaltables 3.1.10 pypi_0 pypi
thop 0.1.1-2209072238 pypi_0 pypi
timm 0.9.16 pypi_0 pypi
tk 8.6.14 h39e8969_0 defaults
tomli 2.0.1 pypi_0 pypi
torchaudio 0.12.1 py310_cu116 pytorch
torchnvjpeg 0.1.0 pypi_0 pypi
torchvision 0.13.1 py310_cu116 pytorch
tqdm 4.65.2 pypi_0 pypi
typing_extensions 4.11.0 pyha770c72_0 conda-forge
tzdata 2024.1 pypi_0 pypi
ultralytics 8.2.11 pypi_0 pypi
urllib3 1.26.18 pypi_0 pypi
virtualenv 20.26.1 pypi_0 pypi
wcwidth 0.2.13 pypi_0 pypi
wheel 0.43.0 py310h06a4308_0 defaults
xlrd 1.2.0 pypi_0 pypi
xz 5.4.6 h5eee18b_1 defaults
yapf 0.40.2 pypi_0 pypi
zipp 3.18.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_1 defaults
zstd 1.5.5 hc292b87_2 defaults
Additional context Add any other context about the problem here.