Cylinder3D icon indicating copy to clipboard operation
Cylinder3D copied to clipboard

Any idea to make the inference faster ?

Open CCodie opened this issue 3 years ago • 3 comments

Really thanks for sharing this awesome code !

I'm really looking forward to do some experiments with this great model, but now I'm struggling with the run time.

In my environments with RTX 3090, only forward pass takes 80 ~ 90ms.

I want to make the entire process (pre-processing + forward pass + post-processing) within 100ms, which is the sampling rate of my LiDAR. (Under 80ms will be better for the margin.)

I can make some trials such as reducing the range and resolution of the point clouds to make pre/post processing faster.

But I'm not sure how to make the model forward pass faster.

Can you give me some advice ? Thanks !!

CCodie avatar Mar 16 '22 05:03 CCodie

Hello @CCodie,

Can you share some details about how you managed to run this on an RTX 3090? Library versions, CUDA version, and so on. And did you make any changes to the code? I am trying to run this on an RTX 3060Ti but I can not get it to run with CUDA 10.2 and changing versions of the libraries seems to have been an issue for many people.

To make the forward pass faster, I guess you could try to reduce the size of the network and retrain it and see if you can keep a similar performance while being faster.

mpQuintana avatar Mar 20 '22 12:03 mpQuintana

Hello @mpQuintana , Actually I wanted to ask that there's some configurations to easily reduce the size of the network, but thanks for your advice !

I'm developing under Ubuntu 20.04 / RTX 3090 / CUDA 11.1 and compatible version of PyTorch, torch-scatter. I remember that getting the proper version of spconv library was quite tricky. Like using spconv version 2.x.x and change some code to make it run, but it gave me totally different results. I solved that problem just using spconv 1.2.1 which is exact same version of Author used.

CCodie avatar Mar 21 '22 02:03 CCodie

@CCodie unfortunately, spconv-v1.x is quite slow and deprecated. I have forked the project to implement the spconv-v2.1.x version (not tested yet with v2.2.x). At home, I am getting 2-3x speedup with an RTX3060, which will be even faster with spconv-v2.2.x

This repo also converts the tensortypes inside the training loop (not the dataloader) and calculated validation on cpu (without batch support). Speedups are possible there.

L-Reichardt avatar Sep 30 '22 14:09 L-Reichardt