Quantization Kaolin Model in TensorRT
Thanks Kaolin team firstly! It is very useful framework. But I have one question, I use Kaolin to train Pointnet++ model. And then I use TensorRT6 to quantize the model, but there is no speedup. I guess there are very little ops , which supported by TensorRT, in Pointnet++, so the converted model may be not the most efficient. But I cannot confirm that. Does anyone have the same experience with me? Is there some other methods to post-training quantize Pointnet++ model to accelerate successfully?
Thanks again!
Hi @double344931987 , Thank you for you interest !
Unfortunately AFAIK PointNet is mainly 3D conv based, and we don't have such kernel compatible with INT8 yet in TensorRT. Feel free to share if you found some specific bottlenecks with PointNet++ and we may be able to help within Kaolin.
Best, Clement
Thanks for your reply. In my experiment, FPS and Group, threeNN ops cost most of the time, all of them have at least two loops in function. Even if I quantize other ops, the speed does not improve. So, it there some way to improve these kind of ops? For example, use cuda to parallel computing instead of loops, although I know it is difficult in point cloud.
We are definitely looking for optimization in Kaolin, including CUDA
@doublexxking when you say quantize, is it equivalent to generating a serialized engine? I have been trying to export this model from PyTorch -> ONNX -> TensoRT but have had my set of issues (like you said, there are unsupported ops, even now). In my case having it run in TensorRT without any kind of optimization should be sufficient.