djl
djl copied to clipboard
djl runs failure with dynamic input format
Description
Use same onnx model as input to transform to tensorrt model with trtexec,if fix input format djl runs ok:
TensorRT-8.4.1.5/bin/trtexec --onnx=models/model.onnx --shapes=input_ids:1x4 --fp16 --saveEngine=model8415-1*4.trt
output:
DEBUG [main] 2023-01-05 00:47:05 Registering EngineProvider: TensorRT
DEBUG [main] 2023-01-05 00:47:05 Registering EngineProvider: TensorFlow
DEBUG [main] 2023-01-05 00:47:05 Registering EngineProvider: MXNet
DEBUG [main] 2023-01-05 00:47:05 Found default engine: MXNet
DEBUG [main] 2023-01-05 00:47:05 Loading TensorRT JNI library from: /root/.djl.ai/tensorrt/8.4.1-0.19.0-linux-x86_64/libdjl_trt.so
DEBUG [main] 2023-01-05 00:47:05 Scanning models in repo: class ai.djl.repository.SimpleRepository, file:/tensorrt/model8415-1*4.trt
DEBUG [main] 2023-01-05 00:47:05 Loading model with Criteria:
Application: UNDEFINED
Input: class djl.input.TensorRTInput
Output: class djl.output.TensorRTOutput
Engine: TensorRT
ModelZoo: ai.djl.localmodelzoo
DEBUG [main] 2023-01-05 00:47:05 Searching model in specified model zoo: ai.djl.localmodelzoo
WARN [main] 2023-01-05 00:47:05 Simple repository pointing to a non-archive file.
DEBUG [main] 2023-01-05 00:47:05 Checking ModelLoader: ai.djl.localmodelzoo:model8415-1*4.trt UNDEFINED [
ai.djl.localmodelzoo/model8415-1*4.trt/model8415-1*4.trt {}
]
DEBUG [main] 2023-01-05 00:47:05 Preparing artifact: file:/tensorrt/model8415-1*4.trt, ai.djl.localmodelzoo/model8415-1*4.trt/model8415-1*4.trt {}
DEBUG [main] 2023-01-05 00:47:05 Skip prepare for local repository.
Loading: 100% |████████████████████████████████████████|
DEBUG [main] 2023-01-05 00:47:05 Using cache dir: /root/.djl.ai/mxnet/1.9.1-cu114mkl-linux-x86_64
DEBUG [main] 2023-01-05 00:47:05 Loading mxnet library from: /root/.djl.ai/mxnet/1.9.1-cu114mkl-linux-x86_64/libmxnet.so
DEBUG [main] 2023-01-05 00:47:07 Using cache dir: /root/.djl.ai/tensorflow
DEBUG [main] 2023-01-05 00:47:07 Loading TensorFlow library from: /root/.djl.ai/tensorflow/2.7.4-cu114-linux-x86_64/libjnitensorflow.so
DEBUG [main] 2023-01-05 00:47:07 Loading TensorRT UFF model /tensorrt/model8415-1*4.trt with options:
[TRT] INFO: [MemUsageChange] Init CUDA: CPU +273, GPU +0, now: CPU 1200, GPU 491 (MiB)
[TRT] INFO: Loaded engine size: 681 MiB
[TRT] INFO: [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +680, now: CPU 0, GPU 680 (MiB)
[TRT] INFO: [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 0, GPU 682 (MiB)
DEBUG [main] 2023-01-05 00:47:11 Model information:
DEBUG [main] 2023-01-05 00:47:11 input_0[input_ids]: int32, (1, 4)
DEBUG [main] 2023-01-05 00:47:11 output_0[output]: float32, (1, 4, 51200)
load model success
result size: 1
result: output: (4, 51200) cpu() float32
[ Exceed max print size ]
[[F@f237ae7, [F@42edde25, [F@6fe5da76, [F@77d95e5a]
dynamic input format runs failure core dump with message "Cuda failure: 2", but without core dump file:
TensorRT-8.4.1.5/bin/trtexec --onnx=models/model.onnx --minShapes=input_ids:1x1 --maxShapes=input_ids:1x16 --optShapes=input_ids:1x4 --workspace=3072 --fp16 --saveEngine=model8415-dynamic.trt
output:
DEBUG [main] 2023-01-05 02:58:57 Scanning models in repo: class ai.djl.repository.SimpleRepository, file:/tensorrt/model8415-dynamic.trt
DEBUG [main] 2023-01-05 02:58:57 Loading model with Criteria:
Application: UNDEFINED
Input: class djl.input.TensorRTInput
Output: class djl.output.TensorRTOutput
Engine: TensorRT
ModelZoo: ai.djl.localmodelzoo
DEBUG [main] 2023-01-05 02:58:57 Searching model in specified model zoo: ai.djl.localmodelzoo
DEBUG [main] 2023-01-05 02:58:57 Registering EngineProvider: TensorRT
DEBUG [main] 2023-01-05 02:58:57 Registering EngineProvider: TensorFlow
DEBUG [main] 2023-01-05 02:58:57 Registering EngineProvider: MXNet
DEBUG [main] 2023-01-05 02:58:57 Found default engine: MXNet
WARN [main] 2023-01-05 02:58:57 Simple repository pointing to a non-archive file.
DEBUG [main] 2023-01-05 02:58:57 Checking ModelLoader: ai.djl.localmodelzoo:model8415-dynamic.trt UNDEFINED [
ai.djl.localmodelzoo/model8415-dynamic.trt/model8415-dynamic.trt {}
]
DEBUG [main] 2023-01-05 02:58:57 Preparing artifact: file:/tensorrt/model8415-dynamic.trt, ai.djl.localmodelzoo/model8415-dynamic.trt/model8415-dynamic.trt {}
DEBUG [main] 2023-01-05 02:58:57 Skip prepare for local repository.
Loading: 100% |████████████████████████████████████████|
DEBUG [main] 2023-01-05 02:58:58 Loading TensorRT JNI library from: /root/.djl.ai/tensorrt/8.4.1-0.19.0-linux-x86_64/libdjl_trt.so
DEBUG [main] 2023-01-05 02:58:58 Using cache dir: /root/.djl.ai/mxnet/1.9.1-cu114mkl-linux-x86_64
DEBUG [main] 2023-01-05 02:58:58 Loading mxnet library from: /root/.djl.ai/mxnet/1.9.1-cu114mkl-linux-x86_64/libmxnet.so
DEBUG [main] 2023-01-05 02:58:59 Using cache dir: /root/.djl.ai/tensorflow
DEBUG [main] 2023-01-05 02:58:59 Loading TensorFlow library from: /root/.djl.ai/tensorflow/2.7.4-cu114-linux-x86_64/libjnitensorflow.so
DEBUG [main] 2023-01-05 02:58:59 Loading TensorRT UFF model /tensorrt/model8415-dynamic.trt with options:
[TRT] INFO: [MemUsageChange] Init CUDA: CPU +273, GPU +0, now: CPU 1538, GPU 491 (MiB)
[TRT] INFO: Loaded engine size: 1006 MiB
[TRT] INFO: [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1001, now: CPU 0, GPU 1001 (MiB)
[TRT] INFO: [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +8, now: CPU 0, GPU 1009 (MiB)
Cuda failure: 2
Aborted (core dumped)
It is worth noting that model8415-dynamic.trt works fine use python tensorrt.
Environment Info
cuda: 11.4 tensorrt: 8.4.1.5 os: ubuntu18.04
Hi, thanks for bringing up this issue. I see that in the second case with the error, the reported error is Cuda failure: 2. This maps to an insufficient memory error cudaErrorMemoryAllocation https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html. Have you tried increasing the workspace size with trtexec (or alternatively use memPoolSize as workspace flag has been deprecated)? Does the GPU have sufficient memory (I am assuming yes since you indicate this works via python)?
If the workspace/memPoolSize increase doesn't solve this issue, can you provide the onnx model you are using so we can work on reproducing the issue?
Hi, thanks for bringing up this issue. I see that in the second case with the error, the reported error is
Cuda failure: 2. This maps to an insufficient memory errorcudaErrorMemoryAllocationhttps://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html. Have you tried increasing the workspace size with trtexec (or alternatively use memPoolSize as workspace flag has been deprecated)? Does the GPU have sufficient memory (I am assuming yes since you indicate this works via python)?If the workspace/memPoolSize increase doesn't solve this issue, can you provide the onnx model you are using so we can work on reproducing the issue?
Hi siddvenk, thanks for help,i can not provide this onnx model to you for some reason, i'll try to find a open source model to reproduce this problem, please give me some time.
Hi, thanks for bringing up this issue. I see that in the second case with the error, the reported error is
Cuda failure: 2. This maps to an insufficient memory errorcudaErrorMemoryAllocationhttps://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html. Have you tried increasing the workspace size with trtexec (or alternatively use memPoolSize as workspace flag has been deprecated)? Does the GPU have sufficient memory (I am assuming yes since you indicate this works via python)?If the workspace/memPoolSize increase doesn't solve this issue, can you provide the onnx model you are using so we can work on reproducing the issue?
Hi siddvenk, onnx mode and demo code already sent to your email ([email protected]), please check and work on reproducing when you free, thanks.