cuDLA-samples Error during Model Conversion Process

Hello,

I hope this message finds you well. I followed the tutorial to successfully convert the model; however, an error occurred during the model conversion process. I am seeking clarification on the potential impact of this error.

The specific error message I encountered is as follows:

[08/23/2023-10:06:30] [V] [TRT] Engine Layer Information: Layer(DLA): {ForeignNode[/model.0/conv/Conv.../model.24/m.2/Conv]}, Tactic: 0x0000000000000003, images (Half[1,3:16,672,672]) -> s8 (Half[1,255:16,84,84]), s16 (Half[1,255:16,42,42]), s32 (Half[1,255:16,21,21]) [08/23/2023-10:06:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +0, now: CPU 14, GPU 0 (MiB) [08/23/2023-10:06:30] [I] Engine built in 13.8529 sec. [08/23/2023-10:06:30] [I] [TRT] Loaded engine size: 14 MiB [08/23/2023-10:06:30] [E] Error[9]: Cannot deserialize serialized engine built with EngineCapability::kDLA_STANDALONE, use cuDLA APIs instead. [08/23/2023-10:06:30] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::65] Error Code 4: Internal Error (Engine deserialization failed.) [08/23/2023-10:06:30] [E] Engine deserialization failed [08/23/2023-10:06:30] [I] Skipped inference phase since --buildOnly is added. &&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/yolov5s_trimmed_reshape_tranpose.onnx --verbose --fp16 --saveEngine=data/loadable/yolov5.fp16.fp16chw16in.fp16chw16out.standalone.bin --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --buildDLAStandalone --useDLACore=0

I would appreciate it if you could kindly provide information about the potential consequences of this error. Does it affect the converted model's functionality or performance?

Thank you very much for your assistance.

Aug 23 '23 02:08 liuweixue001

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms：

** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8

DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg **

Aug 25 '23 03:08 mrfsc

I get same error on Jetson Orin AGX, I think maybe the trt version need 8.6.0 but for jetpack, trt only is 8.5.3, so when trtexec deal with the onnx , DLAalone feature is not support. Maybe need repo docker images?

Sep 06 '23 01:09 DC-Zhou

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms：

** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8

DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Sep 10 '23 13:09 zerollzeng

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms： ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?

Sep 11 '23 08:09 mrfsc

Unfortunately no, You have to wait for its release :-(

Sep 11 '23 12:09 zerollzeng

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms： ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?

I don't think Jetpack 6.0+ is workful, I have tried jetpack 6.0, it had some other issue in running bash loadle.sh

Jul 09 '24 12:07 jinzhongxiao

result.jpg has no result boxes!

Oct 18 '24 06:10 lxzatwowone1

cuDLA-samples
cuDLA-samples copied to clipboard

Error during Model Conversion Process - Impact Inquiry

cuDLA-samples cuDLA-samples copied to clipboard

Error during Model Conversion Process - Impact Inquiry

cuDLA-samples
cuDLA-samples copied to clipboard