Zero Zeng
Zero Zeng
@nvpohanh I guess it's expected since we have more optimized kernel for FP16, am I right?
> I found that even BF16 flag is set, the chosen kernels for convolution are still in FP32 precision. Per @nvpohanh 's comment, maybe the FP32 conv kernels are faster...
@nvpohanh Will inset explicit cast works here?
Anyone can provide a step to reproduce? Thanks!
Requested access.
Use `/usr/src/tensorrt/bin/trtexec --loadEngine=xx.engine --shapes=input:40x3x224x224`, because you are using explicit shape.
Does the above code work if you don't use mp? looks more like a usage issue to me.
> The above code alse work without mp. So there is no problem if you don't use mp. Could you please try don't use mp package but open several terminal...
@nvpohanh Is this expected? (torch 650ms vs trt 590.308 ms)