CenterPoint inference time about int8 pfe

inference time about int8 pfe

Open sunnyln opened this issue 2 years ago • 3 comments

Thank you for your excellent work, @Abraham423

In the Computation Speed session of README.md, I noticed that int8 mode doesn't run faster than fp32/fp16 mode for pfe module.

Do you know what is the reason?

Sep 14 '22 07:09 sunnyln

I can only guess the following reasons

pfe nn computation graph is simple, which only includes 2 groups of linear-bn1d-relu, float mode can already do a good job.
int8 mode has to compute shift and scale factor in order to map float value to int value or the versa, which will also take some time.
int8 mode doesn't really run with int8 totally, it trt engine that decides which layer runs with int value, and which with float one, we have no control over it , for the detailed principle, see here https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#working-with-int8

so comprehensively speaking, int8 mode doesn't show advantages in pfe.

Sep 14 '22 08:09 HaohaoNJU

Thanks @Abraham423, I agreed with you, especially reason 2. Another question. To compute evaluation metrics, run python3 waymo_eval.py --cpp_output --save_path ../results . However, waymo_eval.py only supports load ckpt model. So I wonder to know how to load and evaluate int8 model, e.g: pfe_quant.engine and rpn_quant.engine ? Thank you again and looking forward to your reply.

Sep 23 '22 07:09 sunnyln

Yes it only supported torch ckpt if --run_infer is enabled, if you want to compute evaluation metrics for trt engine results, you should previously run the cpp codes, and passing the output files to the script, and runs with --cpp_output mode

Sep 26 '22 03:09 HaohaoNJU

CenterPoint CenterPoint copied to clipboard

inference time about int8 pfe

CenterPoint
CenterPoint copied to clipboard