TensorRT ❓ [Question] Runtime check of the inference platform FP16 support

❓ Question

Let's assume one converts a TorchScript to a Torch-TensorRT TorchScript requesting inference type to be FP16. At conversion time, if the GPU doesn't support FP16 (GTX1060 typically), a nice ::torch_tensorrt::Error is thrown saying

Requested inference in FP16 but platform does not support FP16

That's all good.

Now, it seems that if one tries to run an already converted Torch-TensorRT TorchScript with an inference type FP16 on a GPU that doesn't support FP16, there is no check and the program crashes with:

warning: Critical error detected c0000374

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffac1baf1d3 in ntdll!RtlIsZeroMemory () from C:\WINDOWS\SYSTEM32\ntdll.dll

My questions are:

Are these observations (still) correct? I am using torch-tensorRT 1.0, so it might have changed.
Is there any plan to check for the device capability at runtime as well, given that it is possible to figure out what the inference type is (don't know if that's possible / easy to do)?

Jun 22 '22 07:06 gcuendet

Torch-TensorRT programs should be built for the specific GPU they will be deployed on. So the

already converted Torch-TensorRT TorchScript with an inference type FP16 on a GPU that doesn't support FP16

case is technically an unsupported usecase due to TensorRT limitations. In most cases programs built on a GPU of a particular compute capability can be run on others of the same compute capability. So in the Torch-TensorRT runtime we check for these, both specific model of GPU and the compute capability when we go to deserialize. IIRC FP16 on pascal is somewhat of a special case which might be why we see this issue

Jun 22 '22 17:06 narendasan

Thanks. That makes perfect sense and I understand that this is not a supported use-case. I am just wondering if there might be a more elegant way to handle that unsupported use-case than crashing.

It would be very nice for example if trying to run such a converted torchscript (with half float enabled precision) on a GPU that doesn't support half float would just throw. But again, I don't know whether that's easily feasible.

Jun 23 '22 07:06 gcuendet

@peri044 Thoughts on expanding the metadata to include information like this? I think we were talking about this a bit earlier. I could see adding platform capability flags to the metadata but not sure if it adds much past addressing this special case since in theory compute capability should do the same thing.

Jun 23 '22 17:06 narendasan

@andi4191 now that you are back, thoughts on this feature?

Aug 12 '22 01:08 narendasan

@narendasan: We can make a check for precision request against the SM compute capability as per support matrix. Something like https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix

This check will be done at runtime though.

Aug 17 '22 02:08 andi4191

Well the question is will it do anything if engines are not portable across compute capability?

Aug 17 '22 02:08 narendasan

I don't think we can do anything other than throwing an error instead of letting it crash. IIRC If the engines are not portable across computing capability, TensorRT fails at deserialization.

Aug 17 '22 03:08 andi4191

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Nov 16 '22 00:11 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Mar 16 '23 00:03 github-actions[bot]

We arent going to implement this as Pascal is about to be deprecated in TensorRT.

Mar 21 '23 23:03 narendasan