CenterPoint
CenterPoint copied to clipboard
inference time about int8 pfe
Thank you for your excellent work, @Abraham423
In the Computation Speed session of README.md, I noticed that int8 mode doesn't run faster than fp32/fp16 mode for pfe module.
Do you know what is the reason?
I can only guess the following reasons
- pfe nn computation graph is simple, which only includes 2 groups of linear-bn1d-relu, float mode can already do a good job.
- int8 mode has to compute shift and scale factor in order to map float value to int value or the versa, which will also take some time.
- int8 mode doesn't really run with int8 totally, it trt engine that decides which layer runs with int value, and which with float one, we have no control over it , for the detailed principle, see here https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#working-with-int8
so comprehensively speaking, int8 mode doesn't show advantages in pfe.
Thanks @Abraham423, I agreed with you, especially reason 2.
Another question.
To compute evaluation metrics, run python3 waymo_eval.py --cpp_output --save_path ../results
. However, waymo_eval.py
only supports load ckpt model. So I wonder to know how to load and evaluate int8 model, e.g: pfe_quant.engine and rpn_quant.engine ?
Thank you again and looking forward to your reply.
Yes it only supported torch ckpt if --run_infer
is enabled, if you want to compute evaluation metrics for trt engine results, you should previously run the cpp codes, and passing the output files to the script, and runs with --cpp_output
mode