onnx-tensorrt icon indicating copy to clipboard operation
onnx-tensorrt copied to clipboard

Crash when model with cast

Open BobDLA opened this issue 5 years ago • 14 comments

Got a crash when cast the input from int8 to float32. Is there any suggestion?

Thanks

image ( int8->float32, int8->int32->float32 will cause crash. int32->float32 is OK. )

Input filename: xxx.onnx ONNX IR version: 0.0.3 Opset version: 9 Producer name: pytorch Producer version: 0.4 Domain:
Model version: 0 Doc string:

Parsing model [2020-02-28 13:28:41 WARNING] onnx-tensorrt/onnx2trt_utils.cpp:232: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Building TensorRT engine, FP16 available:1 Max batch size: 32 Max workspace size: 1024 MiB [2020-02-28 13:28:43 BUG] Assertion failed: regionRanges != nullptr ../builder/cudnnBuilder2.cpp:1884 Aborting...

[2020-02-28 13:28:43 ERROR] ../builder/cudnnBuilder2.cpp (1884) - Assertion Error in makePaddedScale: 0 (regionRanges != nullptr) terminate called after throwing an instance of 'std::runtime_error' what(): Failed to create object Aborted (core dumped)

BobDLA avatar Feb 28 '20 13:02 BobDLA

@BobDLA Have you found any solution on this? I encountered exactly the same error when casting int8 to float32.

I tried both tensorrt 6 and 7 and none worked for me

chenlu001 avatar Mar 21 '20 11:03 chenlu001

i meet same problem

daixiangzi avatar Mar 24 '20 01:03 daixiangzi

Same problem.

rgc183 avatar May 15 '20 02:05 rgc183

I have the same problem. What's puzzling me is that trtexec successfully runs inference with the same ONNX file! Verbose output is identical for both my code (adatped from sampleOnnxMNIST) and trtexec, up to the error mentioned above.

pstadelmann avatar Jun 19 '20 09:06 pstadelmann

Well, actually trtexec was able to run inference because it would use fp32 inputs, which it does by default.

When forced to use int8 inputs via the --inputIOFormats parameters, it stops with the exact same error.

pstadelmann avatar Jul 15 '20 11:07 pstadelmann

Same here

tamirgi avatar Jul 28 '20 14:07 tamirgi

What's the use case of casting INT8->FP32 in this way outside of quantization? It looks from the OP in input is just being casted between different datatypes, so it can be simplified to have the input type be the final casted datatype to begin with

kevinch-nv avatar Apr 17 '21 21:04 kevinch-nv

@kevinch-nv In my case, the goal was to reduce the total processing latency by sending 8-bit data instead of fp32 to the GPU.

pstadelmann avatar Apr 19 '21 08:04 pstadelmann

@kevinch-nv In my case, the goal was to reduce the total processing latency by sending 8-bit data instead of fp32 to the GPU.

Same here

hexiaoyi95 avatar May 13 '21 03:05 hexiaoyi95

Unfortunately, we currently do not support int8->float32 casting outside of quantization. You will have to use full floating point precision to provide the data to TensorRT.

kevinch-nv avatar Jul 08 '21 16:07 kevinch-nv

Seems like a long time issue that hasn't been resolved yet. I actually found a workaround last year and hope it helps.

I wrote a cuda kernal which accepts int8 from CPU and cast to float on GPU, then go ahead with tensorrt as normal process.

chenlu001 avatar Jul 15 '21 01:07 chenlu001

Anyway I think it should be a useful enhancement for some cases when copying latency is really a concern. Tensorrt is a powerful tool, but thinking about how frustrated when tensorrt inference time takes just around 10ms while memory copy takes another 10ms. Using int8 as input without full quantization can be an acceptable trade off.

chenlu001 avatar Jul 15 '21 02:07 chenlu001

2022s here, and still no progress?

handoku avatar Jun 29 '22 12:06 handoku

how about int8->int8 ?

cynthia-1999 avatar Apr 15 '23 10:04 cynthia-1999