TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

trtexec uses half the memory of python API

Open xrstokes opened this issue 3 years ago • 2 comments

Description

Sorry to bother you. This may be normal or correct. If I Run....

!/usr/src/tensorrt/bin/trtexec --loadEngine=yolov7-tiny-nms.trt --batch=1

I get this output.....

[08/08/2022-01:36:09] [I] [TRT] [MemUsageChange] Init CUDA: CPU +229, GPU +0, now: CPU 269, GPU 1793 (MiB)
[08/08/2022-01:36:09] [I] [TRT] Loaded engine size: 21 MiB
[08/08/2022-01:36:11] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +159, GPU +182, now: CPU 434, GPU 1981 (MiB)
[08/08/2022-01:36:13] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +240, GPU +241, now: CPU 674, GPU 2222 (MiB)
[08/08/2022-01:36:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +20, now: CPU 0, GPU 20 (MiB)
[08/08/2022-01:36:13] [I] Engine loaded in 5.11544 sec.
[08/08/2022-01:36:13] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 653, GPU 2200 (MiB)
[08/08/2022-01:36:13] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +1, now: CPU 653, GPU 2201 (MiB)
[08/08/2022-01:36:13] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 87 (MiB)

If i run.....

Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))
logger = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(logger, namespace="")
with open(w, 'rb') as f, trt.Runtime(logger) as runtime:
    model = runtime.deserialize_cuda_engine(f.read())
bindings = OrderedDict()
for index in range(model.num_bindings):
    name = model.get_binding_name(index)
    dtype = trt.nptype(model.get_binding_dtype(index))
    shape = tuple(model.get_binding_shape(index))
    data = torch.from_numpy(np.empty(shape, dtype=np.dtype(dtype))).to(device)
    bindings[name] = Binding(name, dtype, shape, data, int(data.data_ptr()))
binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())
context = model.create_execution_context()

I get this output.....

[08/08/2022-01:34:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 319, GPU 2113 (MiB)
[08/08/2022-01:34:02] [TRT] [I] Loaded engine size: 21 MiB
[08/08/2022-01:34:03] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +163, now: CPU 504, GPU 2324 (MiB)
[08/08/2022-01:34:05] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +241, GPU +289, now: CPU 745, GPU 2613 (MiB)
[08/08/2022-01:34:05] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +20, now: CPU 0, GPU 20 (MiB)
[08/08/2022-01:34:32] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +3, now: CPU 2066, GPU 3638 (MiB)
[08/08/2022-01:34:32] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU -1, now: CPU 2066, GPU 3634 (MiB)
[08/08/2022-01:34:32] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +67, now: CPU 0, GPU 87 (MiB)

I don't have enough ram left over for other things if i use python. But i don't know how to load model and do inference using trtexec. Can someone help me reduce my ram usage please? It is only a small yolov7-tiny model. It works great otherwise. 66ms inference.

Environment

**TensorRT Version = "nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime" : **NVIDIA GPU = jetson nano: **NVIDIA Driver Version ??: **CUDA Version 10.2: **CUDNN Version: **Operating System = jetpack 4.6: **Python Version (if applicable) = 3.8.0: Tensorflow Version (if applicable): **PyTorch Version (if applicable) 1.10: Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

xrstokes avatar Aug 08 '22 01:08 xrstokes

Well perhaps you can try disabling cublas/cudnn, they consume lots of memory. see https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html?highlight=tactic%20source#tensorrt.IBuilderConfig.set_tactic_sources

also I would suggest to use c++ api to get finer memory control.

zerollzeng avatar Aug 08 '22 09:08 zerollzeng

also use fp16 or int8 would be helpful too.

zerollzeng avatar Aug 08 '22 09:08 zerollzeng

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

ttyio avatar Dec 06 '22 01:12 ttyio