dynamic_reconfigure
dynamic_reconfigure copied to clipboard
Parameter description messages serialising to incorrect length on Arm
Issue
On serialising the parameter description message for custom configs the serialised message is much larger than expected causing network saturation everytime a client connects. This issue only seems to be present when a config has multiple parameters of the same type and the server is run on ARM in Release.
Language | Arch | Build | Serialised Bytes | Deserialised Message |
---|---|---|---|---|
C++ | X86 | Release | 335B | Valid |
C++ | ARM | Release | 660MB | Valid |
C++ | ARM | Debug | 355B | Valid |
Steps To Recreate
ROS Distro - noetic
A test package which can be used to recreate this message is provided here: https://github.com/ashnap123/dynamic_reconfigure_test
The following is the simplest way to recreate the issue, must be run on ARM. This is included as a test in the above package.
catkin_make run_tests -DCMAKE_BUILD_TYPE=Release dynamic_test
TEST(ParameterDescriptionSerialsation, test_serialisation_multiple_parameters) {
auto description = dynamic_test::ExampleBrokenConfig::__getDescriptionMessage__();
auto serialisationLength = ros::serialization::serializationLength(description);
EXPECT_LT(serialisationLength, 1024);
}
Also on running the tests for the dynamic_reconfigure package on ARM in Release the following warnings are generated, which is assumed to be the same issue:
[ERROR] [1621350696.145575854]: a message of over a gigabyte was predicted in tcpros. that seems highly unlikely, so I'll assume protocol synchronization is lost.
Just for reference, we've seen what I'm assuming is the results of this, in production, on arm64
This looks like it could be a gcc compiler issue, the same as https://github.com/ros/ros_comm/issues/2197 & https://github.com/ros/roscpp_core/issues/130
Work around would therefore be to upgrade gcc version or compile not in release mode (using -O2 instead of the implicit -O3)
PR https://github.com/ros/roscpp_core/pull/136 should fix this. I'm looking for someone who could verify. Just please notice that Focal now has GCC 9.4 by default where I could not reproduce the issue. So the test would need to be done with GCC 9.3 installed explicitly and dynamic_reconfigure built from source.