Using dynamic number of labels
Addressing my own issue #55, using dynamic sized semantic prior matrices and initialization in runtime. Reading number of labels from the launch file as a rosparam.
I believe the biggest concern could be the performance issue, for this here is a comparison of timings on the provided simulated semantic rosbag file.
With static labels
[ INFO] [1612796164.603515954, 60.342383269]: Layer memory: 16679159
[ INFO] [1612796164.603544030, 60.342383269]: Updating mesh.
[ INFO] [1612796164.620216339, 60.367825609]: Updating mesh.78.072170
[ INFO] [1612796164.639555585, 60.414999999]: Integrating a pointcloud with 345600 points.
[ INFO] [1612796164.711103207, 60.544733472]: Finished integrating in 0.071499 seconds, have 339 blocks.
[ INFO] [1612796164.711173976, 60.544733472]: Timings:
SM Timing
-----------
inserting_missed_blocks 59 00.000255 (00.000004 +- 00.000003) [00.000001,00.000094]
integrate/fast 59 05.165177 (00.087545 +- 00.009905) [00.065052,00.165955]
mesh/publish 108 00.286112 (00.002649 +- 00.003277) [00.000005,00.009061]
mesh/update 108 00.578068 (00.005352 +- 00.004863) [00.000168,00.014176]
ptcloud_preprocess 59 00.590554 (00.010009 +- 00.004439) [00.008309,00.034639]
remove_distant_blocks 59 00.000979 (00.000017 +- 00.000005) [00.000006,00.000031]
With dynamic labels
[ INFO] [1612795707.691550318, 59.152784140]: Layer memory: 16629958
[ INFO] [1612795707.691564063, 59.152784140]: Updating mesh.
[ INFO] [1612795707.707433151, 59.172894621]: Updating mesh.78.072170
[ INFO] [1612795707.728666358, 59.233487071]: Updating mesh.78.072170
[ INFO] [1612795707.775409911, 59.323973636]: Integrating a pointcloud with 345600 points.
[ INFO] [1612795707.861115732, 59.483209653]: Finished integrating in 0.085651 seconds, have 339 blocks.
[ INFO] [1612795707.861209100, 59.483209653]: Timings:
SM Timing
-----------
inserting_missed_blocks 54 00.000232 (00.000004 +- 00.000003) [00.000001,00.000066]
integrate/fast 54 05.019893 (00.092961 +- 00.010746) [00.070486,00.244003]
mesh/publish 98 00.245623 (00.002506 +- 00.003167) [00.000007,00.009175]
mesh/update 98 00.499049 (00.005092 +- 00.005313) [00.000144,00.018225]
ptcloud_preprocess 54 00.486605 (00.009011 +- 00.000462) [00.008290,00.018598]
remove_distant_blocks 54 00.000844 (00.000016 +- 00.000005) [00.000002,00.000034]
Hey @ToniRV,
Thanks a lot for the feedback and also for the ideas. I think it would be really nice to have this runtime operation, but you are also perfectly right this can't couse perfromance loss.
I am sadly not a pro here, but did a short research on Eigen Dynamic memory allocation and found this. Source.
Here is this constructor:
inline DenseStorage(int size, int rows, int) : m_data(internal::aligned_new<T>(size)), m_rows(rows) {}Here, the m_data member is the actual array of coefficients of the matrix. As you see, it is dynamically allocated. Rather than calling new[] or malloc(), as you can see, we have our own internal::aligned_new defined in src/Core/util/Memory.h. What it does is that if vectorization is enabled, then it uses a platform-specific call to allocate a 128-bit-aligned array, as that is very useful for vectorization with both SSE2 and AltiVec. If vectorization is disabled, it amounts to the standard new[].
I believe if we don't change the directives EIGEN_DONT_ALIGN and EIGEN_MAX_ALIGN_BYTES (source) in the CMake build or with #defines, the memory allignment should be allright and the matrices will be automatically vectorized.
What do you think of this? Do you think if I replace the initialization with internal::aligned_new<T> that could bridge the performance gap?
@RozDavid I also understood that when re-reading Eigen, the operations should already be vectorized... So it might well be we can't improve on that. It's fine for me, I think this adds a lot of flexibility anyway.
Another thing we should be careful is on the map size: having a 128-bit aligned array on a per-voxel basis may increase dramatically the size of the volumetric map. Maybe you could try to generate a map with and without this feature and see how many Mb is one with respect to the other? The rosservice save_map should just do that.
Hello @ToniRV,
I ran a few quick tests comparing the static and dynamic approaches with different number of labels for the probability matrices.
So when tested with static I recompiled the code with kTotalNumberOfLabels=128, changed the hardcoded prior to the apropriate number, and tested the same 21 + 128 setting with the dynamic sizes as well. Run the full rosbag, saved the layer in a vxblx file and copied the timings for the last pointcloud integration with the max. number of initialized blocks. The stats are copied to the end of this comment.
It was a bit surprising for me, that there is no difference between the layer sizes, only in the integration times. My interpretation is that Eigen allocates fixed sized memory up until a certain size of matrix (turns out this might be a bigger treshold than 128).
As all the vxblx file sizes are between 66.3 and 67.2 mb, the only difference here is the number of allocated blocks and about 15% performance loss in pointcloud integration.
Surely it depends on the use case if the flexibility is worth the performance loss or not, but I wanted to share this with you either way if you choose to merge or not.
The results can be compared here:
########## Dynamic 21 ##########
Vxblx file size: 67.1Mb
[ INFO] [1612890061.997756189, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890062.089571026, 121.805000000]: Finished integrating in 0.091762 seconds, have 1207 blocks.
[ INFO] [1612890062.089766452, 121.805000000]: Timings:
SM Timing
-----------
inserting_missed_blocks 285 00.001599 (00.000006 +- 00.000001) [00.000001,00.000136]
integrate/fast 285 27.095124 (00.095071 +- 00.004953) [00.058792,00.243477]
mesh/publish 399 01.489049 (00.003732 +- 00.002256) [00.000006,00.013048]
mesh/update 399 03.108085 (00.007790 +- 00.002853) [00.000171,00.018809]
ptcloud_preprocess 285 02.596166 (00.009109 +- 00.000871) [00.008302,00.036305]
remove_distant_blocks 285 00.019437 (00.000068 +- 00.000045) [00.000004,00.000402]
[ INFO] [1612890062.089818329, 121.805000000]: Layer memory: 59385627
[ INFO] [1612890062.089849037, 121.805000000]: Updating mesh.
########## Dynamic 128 ##########
Vxblx file size: 66.3Mb
[ INFO] [1612889885.600096098, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612889886.008610556, 121.805000000]: Finished integrating in 0.408455 seconds, have 1205 blocks.
[ INFO] [1612889886.008808061, 121.805000000]: Timings:
SM Timing
-----------
inserting_missed_blocks 83 00.001354 (00.000016 +- 00.000028) [00.000001,00.000178]
integrate/fast 83 36.546054 (00.440314 +- 00.047001) [00.274163,00.750914]
mesh/publish 92 00.422405 (00.004591 +- 00.001759) [00.000006,00.011521]
mesh/update 92 00.844049 (00.009174 +- 00.001211) [00.000178,00.014639]
ptcloud_preprocess 83 00.757355 (00.009125 +- 00.001122) [00.008222,00.022541]
remove_distant_blocks 83 00.006077 (00.000073 +- 00.000031) [00.000002,00.000203]
[ INFO] [1612889886.008843359, 121.805000000]: Layer memory: 59287225
########## Static 21 ##########
Vxblx file size: 67.2Mb
[ INFO] [1612890454.482768384, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890454.564294501, 121.805000000]: Finished integrating in 0.081479 seconds, have 1209 blocks.
[ INFO] [1612890454.564449636, 121.805000000]: Timings:
SM Timing
-----------
inserting_missed_blocks 290 00.001434 (00.000005 +- 00.000001) [00.000001,00.000117]
integrate/fast 290 24.979322 (00.086136 +- 00.004859) [00.053003,00.161756]
mesh/publish 493 01.521718 (00.003087 +- 00.003551) [00.000007,00.008997]
mesh/update 493 03.285606 (00.006665 +- 00.005909) [00.000138,00.017277]
ptcloud_preprocess 290 02.654694 (00.009154 +- 00.002964) [00.008272,00.034363]
remove_distant_blocks 290 00.018787 (00.000065 +- 00.000037) [00.000002,00.000652]
[ INFO] [1612890454.564478470, 121.805000000]: Layer memory: 59484029
########## Static 128 ##########
Vxblx file size: 66.5Mb
[ INFO] [1612891013.389336555, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612891013.729743330, 121.805000000]: Finished integrating in 0.340356 seconds, have 1204 blocks.
[ INFO] [1612891013.729900746, 121.805000000]: Timings:
SM Timing
-----------
inserting_missed_blocks 97 00.001169 (00.000012 +- 00.000019) [00.000001,00.000123]
integrate/fast 97 35.617008 (00.367186 +- 00.036256) [00.230930,00.561777]
mesh/publish 106 00.547046 (00.005161 +- 00.001862) [00.000006,00.012017]
mesh/update 106 01.092757 (00.010309 +- 00.001616) [00.000182,00.018571]
ptcloud_preprocess 97 00.884389 (00.009117 +- 00.000556) [00.008329,00.018017]
remove_distant_blocks 97 00.005814 (00.000060 +- 00.000037) [00.000003,00.000304]
[ INFO] [1612891013.729936872, 121.805000000]: Layer memory: 59238024
[ INFO] [1612891013.729965223, 121.805000000]: Updating mesh.