Zero Zeng

Results 571 comments of Zero Zeng

@nvpohanh I guess it's expected since we have more optimized kernel for FP16, am I right?

> I found that even BF16 flag is set, the chosen kernels for convolution are still in FP32 precision. Per @nvpohanh 's comment, maybe the FP32 conv kernels are faster...

Use `/usr/src/tensorrt/bin/trtexec --loadEngine=xx.engine --shapes=input:40x3x224x224`, because you are using explicit shape.

Does the above code work if you don't use mp? looks more like a usage issue to me.

> The above code alse work without mp. So there is no problem if you don't use mp. Could you please try don't use mp package but open several terminal...

@nvpohanh Is this expected? (torch 650ms vs trt 590.308 ms)