❓ [Question] Runtime check of the inference platform FP16 support
❓ Question
Let's assume one converts a TorchScript to a Torch-TensorRT TorchScript requesting inference type to be FP16. At conversion time, if the GPU doesn't support FP16 (GTX1060 typically), a nice ::torch_tensorrt::Error is thrown saying
Requested inference in FP16 but platform does not support FP16
That's all good.
Now, it seems that if one tries to run an already converted Torch-TensorRT TorchScript with an inference type FP16 on a GPU that doesn't support FP16, there is no check and the program crashes with:
warning: Critical error detected c0000374
Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffac1baf1d3 in ntdll!RtlIsZeroMemory () from C:\WINDOWS\SYSTEM32\ntdll.dll
My questions are:
- Are these observations (still) correct? I am using torch-tensorRT 1.0, so it might have changed.
- Is there any plan to check for the device capability at runtime as well, given that it is possible to figure out what the inference type is (don't know if that's possible / easy to do)?
Torch-TensorRT programs should be built for the specific GPU they will be deployed on. So the
already converted Torch-TensorRT TorchScript with an inference type FP16 on a GPU that doesn't support FP16
case is technically an unsupported usecase due to TensorRT limitations. In most cases programs built on a GPU of a particular compute capability can be run on others of the same compute capability. So in the Torch-TensorRT runtime we check for these, both specific model of GPU and the compute capability when we go to deserialize. IIRC FP16 on pascal is somewhat of a special case which might be why we see this issue
Thanks. That makes perfect sense and I understand that this is not a supported use-case. I am just wondering if there might be a more elegant way to handle that unsupported use-case than crashing.
It would be very nice for example if trying to run such a converted torchscript (with half float enabled precision) on a GPU that doesn't support half float would just throw. But again, I don't know whether that's easily feasible.
@peri044 Thoughts on expanding the metadata to include information like this? I think we were talking about this a bit earlier. I could see adding platform capability flags to the metadata but not sure if it adds much past addressing this special case since in theory compute capability should do the same thing.
@andi4191 now that you are back, thoughts on this feature?
@narendasan: We can make a check for precision request against the SM compute capability as per support matrix. Something like https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix
This check will be done at runtime though.
Well the question is will it do anything if engines are not portable across compute capability?
I don't think we can do anything other than throwing an error instead of letting it crash. IIRC If the engines are not portable across computing capability, TensorRT fails at deserialization.
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
We arent going to implement this as Pascal is about to be deprecated in TensorRT.