cuDLA-samples icon indicating copy to clipboard operation
cuDLA-samples copied to clipboard

Error during Model Conversion Process - Impact Inquiry

Open liuweixue001 opened this issue 1 year ago • 6 comments

Hello,

I hope this message finds you well. I followed the tutorial to successfully convert the model; however, an error occurred during the model conversion process. I am seeking clarification on the potential impact of this error.

The specific error message I encountered is as follows:

[08/23/2023-10:06:30] [V] [TRT] Engine Layer Information: Layer(DLA): {ForeignNode[/model.0/conv/Conv.../model.24/m.2/Conv]}, Tactic: 0x0000000000000003, images (Half[1,3:16,672,672]) -> s8 (Half[1,255:16,84,84]), s16 (Half[1,255:16,42,42]), s32 (Half[1,255:16,21,21]) [08/23/2023-10:06:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +0, now: CPU 14, GPU 0 (MiB) [08/23/2023-10:06:30] [I] Engine built in 13.8529 sec. [08/23/2023-10:06:30] [I] [TRT] Loaded engine size: 14 MiB [08/23/2023-10:06:30] [E] Error[9]: Cannot deserialize serialized engine built with EngineCapability::kDLA_STANDALONE, use cuDLA APIs instead. [08/23/2023-10:06:30] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::65] Error Code 4: Internal Error (Engine deserialization failed.) [08/23/2023-10:06:30] [E] Engine deserialization failed [08/23/2023-10:06:30] [I] Skipped inference phase since --buildOnly is added. &&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/yolov5s_trimmed_reshape_tranpose.onnx --verbose --fp16 --saveEngine=data/loadable/yolov5.fp16.fp16chw16in.fp16chw16out.standalone.bin --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --buildDLAStandalone --useDLACore=0

I would appreciate it if you could kindly provide information about the potential consequences of this error. Does it affect the converted model's functionality or performance?

Thank you very much for your assistance.

liuweixue001 avatar Aug 23 '23 02:08 liuweixue001

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms:

** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8

DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg **

mrfsc avatar Aug 25 '23 03:08 mrfsc

I get same error on Jetson Orin AGX, I think maybe the trt version need 8.6.0 but for jetpack, trt only is 8.5.3, so when trtexec deal with the onnx , DLAalone feature is not support. Maybe need repo docker images?

DC-Zhou avatar Sep 06 '23 01:09 DC-Zhou

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms:

** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8

DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

zerollzeng avatar Sep 10 '23 13:09 zerollzeng

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms: ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?

mrfsc avatar Sep 11 '23 08:09 mrfsc

Unfortunately no, You have to wait for its release :-(

zerollzeng avatar Sep 11 '23 12:09 zerollzeng

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms: ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?

I don't think Jetpack 6.0+ is workful, I have tried jetpack 6.0, it had some other issue in running bash loadle.sh

jinzhongxiao avatar Jul 09 '24 12:07 jinzhongxiao

result.jpg has no result boxes!

lxzatwowone1 avatar Oct 18 '24 06:10 lxzatwowone1