superpoint_graph icon indicating copy to clipboard operation
superpoint_graph copied to clipboard

how to improve the efficiency of partition.py?

Open cbyzju opened this issue 6 years ago • 6 comments

When the map is large, such as the whole Paris, there exist billions of points in the map, it will cost several days in partition stage, do you have any idea to improve the efficiency of partition.py? for the parameter "reg_strength" in partition.py, I found that large reg_strength is better for large objects segmentation, while small reg_strength is better for tiny objects, is it correct?

cbyzju avatar Jan 29 '19 01:01 cbyzju

Hi,

The partition is indeed the bottleneck of our pipeline. As of now, the only way to speed things up would be to increase the voxel width of the subsampling.

However, we will release a new implementation of the L0 cut pursuit algorithm within one month. It allows CPU parallelization of graph cuts, and will speed the partition by an order of magnitude. I will warn you in this issue as soon as it is ready.

As of now, the reg_strength is set for the whole graph. You should set it such that the smallest objects are not overlooked. However, we will release soon (1 or 2 month) a supervised partition model which will allow for more precise and adaptive superpoints.

loicland avatar Jan 29 '19 10:01 loicland

Hi Loïc,

I have been looking into learning a parameterization for the superpoint partition based on input cloud properties and an extended family of features in addition to yours (linearity, planarity, scattering, verticality, optional rgb) but was not so successful in making the partition supervized so far.

I would be very interested in hearing about your partition acceleration, and even more if you make it supervized ! How is that going on ?

PS : For what is is worth, I did manage to reduce the overall partition time by working on subsampling and alternatives to sklearn's off-the-shelf nearest neighbor algorithm.

drprojects avatar Feb 06 '19 11:02 drprojects

Hi,

The L0 cut pursuit is a greedy algorithm, and the current implementation might not perform so well if the dimension of the vertex features is too large. Next version of L0Cp (soon i promise!) Might be better though.

But our upcoming supervized partition works much better than the current one, which translate into a significant increase in precision for the whole pipeline. As the manuscript is currently under review I can't post it, but a preprint is coming out soon.

loicland avatar Feb 06 '19 14:02 loicland

Which alternative to sklearn do you recommend for the nearest neighbors?

loicland avatar Feb 06 '19 16:02 loicland

That sounds great for your new supervized partition, really looking forward to trying that !

As for K-NN searches, for a pure CPU context, I found sklearn with a kd-tree and parallelized jobs still gives good results compared to other methods. Since the most costly operation is not the kd-tree construction but the query, for a very dense cloud where I can afford working on susbsampled data but would like rich local features, fitting the large cloud and querying only the subsampled points saves a lot of time. This looks like : nn = NearestNeighbors( n_neighbors=k, algorithm='kd_tree', n_jobs=-1 ).fit( xyz_raw ) nn.kneighbors( X=xyz_subsampled )

But for large pointclouds, if GPU is available, FAISS works great for me : https://github.com/facebookresearch/faiss Rather than splitting the search space into axis-aligned boxes like in a kd-tree, they build voronoi cells based on the data distribution. It can handle a full large cloud well but requires a parameterization of the number of cells. Overall, a good heuristic for finding the optimum between speed and accuracy for K-NN searches seems to be : num_cells = k * sqrt( num_points ). With k in [0.5 , 5.0], roughly

I benchmarked it on a ~10M-point dataset to do 45-NN search for a query subset of 10K points. As a baseline, the sklearn K-NN search with KD-Tree took about 40 seconds to compute on my machine. With FAISS, I managed to get the same results with less than 2% error on farthest neighbors within about 25-30 seconds. And the larger the query set, the larger the query time difference is, I reckon.

Hope this helps !

drprojects avatar Feb 07 '19 09:02 drprojects

It's in the works, will let you know asap!

loicland avatar Mar 06 '19 10:03 loicland