Kimera-Semantics icon indicating copy to clipboard operation
Kimera-Semantics copied to clipboard

Using dynamic number of labels

Open RozDavid opened this issue 4 years ago • 3 comments

Addressing my own issue #55, using dynamic sized semantic prior matrices and initialization in runtime. Reading number of labels from the launch file as a rosparam.

I believe the biggest concern could be the performance issue, for this here is a comparison of timings on the provided simulated semantic rosbag file.

With static labels


[ INFO] [1612796164.603515954, 60.342383269]: Layer memory: 16679159
[ INFO] [1612796164.603544030, 60.342383269]: Updating mesh.
[ INFO] [1612796164.620216339, 60.367825609]: Updating mesh.78.072170               
[ INFO] [1612796164.639555585, 60.414999999]: Integrating a pointcloud with 345600 points.
[ INFO] [1612796164.711103207, 60.544733472]: Finished integrating in 0.071499 seconds, have 339 blocks.
[ INFO] [1612796164.711173976, 60.544733472]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     59	00.000255	(00.000004 +- 00.000003)	[00.000001,00.000094]
integrate/fast         	     59	05.165177	(00.087545 +- 00.009905)	[00.065052,00.165955]
mesh/publish           	    108	00.286112	(00.002649 +- 00.003277)	[00.000005,00.009061]
mesh/update            	    108	00.578068	(00.005352 +- 00.004863)	[00.000168,00.014176]
ptcloud_preprocess     	     59	00.590554	(00.010009 +- 00.004439)	[00.008309,00.034639]
remove_distant_blocks  	     59	00.000979	(00.000017 +- 00.000005)	[00.000006,00.000031]

With dynamic labels

[ INFO] [1612795707.691550318, 59.152784140]: Layer memory: 16629958
[ INFO] [1612795707.691564063, 59.152784140]: Updating mesh.
[ INFO] [1612795707.707433151, 59.172894621]: Updating mesh.78.072170               
[ INFO] [1612795707.728666358, 59.233487071]: Updating mesh.78.072170               
[ INFO] [1612795707.775409911, 59.323973636]: Integrating a pointcloud with 345600 points.
[ INFO] [1612795707.861115732, 59.483209653]: Finished integrating in 0.085651 seconds, have 339 blocks.
[ INFO] [1612795707.861209100, 59.483209653]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     54	00.000232	(00.000004 +- 00.000003)	[00.000001,00.000066]
integrate/fast         	     54	05.019893	(00.092961 +- 00.010746)	[00.070486,00.244003]
mesh/publish           	     98	00.245623	(00.002506 +- 00.003167)	[00.000007,00.009175]
mesh/update            	     98	00.499049	(00.005092 +- 00.005313)	[00.000144,00.018225]
ptcloud_preprocess     	     54	00.486605	(00.009011 +- 00.000462)	[00.008290,00.018598]
remove_distant_blocks  	     54	00.000844	(00.000016 +- 00.000005)	[00.000002,00.000034]

RozDavid avatar Feb 08 '21 15:02 RozDavid

Hey @ToniRV,

Thanks a lot for the feedback and also for the ideas. I think it would be really nice to have this runtime operation, but you are also perfectly right this can't couse perfromance loss.

I am sadly not a pro here, but did a short research on Eigen Dynamic memory allocation and found this. Source.

Here is this constructor:

inline DenseStorage(int size, int rows, int) : m_data(internal::aligned_new<T>(size)), m_rows(rows) {}

Here, the m_data member is the actual array of coefficients of the matrix. As you see, it is dynamically allocated. Rather than calling new[] or malloc(), as you can see, we have our own internal::aligned_new defined in src/Core/util/Memory.h. What it does is that if vectorization is enabled, then it uses a platform-specific call to allocate a 128-bit-aligned array, as that is very useful for vectorization with both SSE2 and AltiVec. If vectorization is disabled, it amounts to the standard new[].

I believe if we don't change the directives EIGEN_DONT_ALIGN and EIGEN_MAX_ALIGN_BYTES (source) in the CMake build or with #defines, the memory allignment should be allright and the matrices will be automatically vectorized.

What do you think of this? Do you think if I replace the initialization with internal::aligned_new<T> that could bridge the performance gap?

RozDavid avatar Feb 09 '21 08:02 RozDavid

@RozDavid I also understood that when re-reading Eigen, the operations should already be vectorized... So it might well be we can't improve on that. It's fine for me, I think this adds a lot of flexibility anyway. Another thing we should be careful is on the map size: having a 128-bit aligned array on a per-voxel basis may increase dramatically the size of the volumetric map. Maybe you could try to generate a map with and without this feature and see how many Mb is one with respect to the other? The rosservice save_map should just do that.

ToniRV avatar Feb 09 '21 13:02 ToniRV

Hello @ToniRV,

I ran a few quick tests comparing the static and dynamic approaches with different number of labels for the probability matrices.

So when tested with static I recompiled the code with kTotalNumberOfLabels=128, changed the hardcoded prior to the apropriate number, and tested the same 21 + 128 setting with the dynamic sizes as well. Run the full rosbag, saved the layer in a vxblx file and copied the timings for the last pointcloud integration with the max. number of initialized blocks. The stats are copied to the end of this comment.

It was a bit surprising for me, that there is no difference between the layer sizes, only in the integration times. My interpretation is that Eigen allocates fixed sized memory up until a certain size of matrix (turns out this might be a bigger treshold than 128).

As all the vxblx file sizes are between 66.3 and 67.2 mb, the only difference here is the number of allocated blocks and about 15% performance loss in pointcloud integration.

Surely it depends on the use case if the flexibility is worth the performance loss or not, but I wanted to share this with you either way if you choose to merge or not.

The results can be compared here:

########## Dynamic 21 ##########

Vxblx file size: 67.1Mb

[ INFO] [1612890061.997756189, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890062.089571026, 121.805000000]: Finished integrating in 0.091762 seconds, have 1207 blocks.
[ INFO] [1612890062.089766452, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	    285	00.001599	(00.000006 +- 00.000001)	[00.000001,00.000136]
integrate/fast         	    285	27.095124	(00.095071 +- 00.004953)	[00.058792,00.243477]
mesh/publish           	    399	01.489049	(00.003732 +- 00.002256)	[00.000006,00.013048]
mesh/update            	    399	03.108085	(00.007790 +- 00.002853)	[00.000171,00.018809]
ptcloud_preprocess     	    285	02.596166	(00.009109 +- 00.000871)	[00.008302,00.036305]
remove_distant_blocks  	    285	00.019437	(00.000068 +- 00.000045)	[00.000004,00.000402]

[ INFO] [1612890062.089818329, 121.805000000]: Layer memory: 59385627
[ INFO] [1612890062.089849037, 121.805000000]: Updating mesh.

########## Dynamic 128 ##########

Vxblx file size: 66.3Mb

[ INFO] [1612889885.600096098, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612889886.008610556, 121.805000000]: Finished integrating in 0.408455 seconds, have 1205 blocks.
[ INFO] [1612889886.008808061, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     83	00.001354	(00.000016 +- 00.000028)	[00.000001,00.000178]
integrate/fast         	     83	36.546054	(00.440314 +- 00.047001)	[00.274163,00.750914]
mesh/publish           	     92	00.422405	(00.004591 +- 00.001759)	[00.000006,00.011521]
mesh/update            	     92	00.844049	(00.009174 +- 00.001211)	[00.000178,00.014639]
ptcloud_preprocess     	     83	00.757355	(00.009125 +- 00.001122)	[00.008222,00.022541]
remove_distant_blocks  	     83	00.006077	(00.000073 +- 00.000031)	[00.000002,00.000203]

[ INFO] [1612889886.008843359, 121.805000000]: Layer memory: 59287225

########## Static 21 ##########

Vxblx file size: 67.2Mb

[ INFO] [1612890454.482768384, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890454.564294501, 121.805000000]: Finished integrating in 0.081479 seconds, have 1209 blocks.
[ INFO] [1612890454.564449636, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	    290	00.001434	(00.000005 +- 00.000001)	[00.000001,00.000117]
integrate/fast         	    290	24.979322	(00.086136 +- 00.004859)	[00.053003,00.161756]
mesh/publish           	    493	01.521718	(00.003087 +- 00.003551)	[00.000007,00.008997]
mesh/update            	    493	03.285606	(00.006665 +- 00.005909)	[00.000138,00.017277]
ptcloud_preprocess     	    290	02.654694	(00.009154 +- 00.002964)	[00.008272,00.034363]
remove_distant_blocks  	    290	00.018787	(00.000065 +- 00.000037)	[00.000002,00.000652]

[ INFO] [1612890454.564478470, 121.805000000]: Layer memory: 59484029

########## Static 128 ##########

Vxblx file size: 66.5Mb

[ INFO] [1612891013.389336555, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612891013.729743330, 121.805000000]: Finished integrating in 0.340356 seconds, have 1204 blocks.
[ INFO] [1612891013.729900746, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     97	00.001169	(00.000012 +- 00.000019)	[00.000001,00.000123]
integrate/fast         	     97	35.617008	(00.367186 +- 00.036256)	[00.230930,00.561777]
mesh/publish           	    106	00.547046	(00.005161 +- 00.001862)	[00.000006,00.012017]
mesh/update            	    106	01.092757	(00.010309 +- 00.001616)	[00.000182,00.018571]
ptcloud_preprocess     	     97	00.884389	(00.009117 +- 00.000556)	[00.008329,00.018017]
remove_distant_blocks  	     97	00.005814	(00.000060 +- 00.000037)	[00.000003,00.000304]

[ INFO] [1612891013.729936872, 121.805000000]: Layer memory: 59238024
[ INFO] [1612891013.729965223, 121.805000000]: Updating mesh.

RozDavid avatar Feb 09 '21 18:02 RozDavid