ros_comm icon indicating copy to clipboard operation
ros_comm copied to clipboard

No UDP connection when total length of (node name + topic name + msg name) is of specific length (= 87?)

Open j3sq opened this issue 2 years ago • 1 comments

This sounds really silly, but I was able to reproduce it in different setups and managed to create a simple reproduction (15 lines of cpp) example to test on other machines using docker. I put the the reproduction workspace in this repo: https://github.com/j3sq/ros_udp_issue with steps to reproduce in the README.md file

Short explanation: While migrating a private project from Kinetic to Noetic we noticed that some topics with unreliable() transport hint were not working. When checking with roswtf it confirmed that by saying: node a and node b should be connected but they're not.

Oddly, by accident, we realized that when the node name was changed it was working again as expected. We kept narrowing down and discovered that if we change any of node name, topic name or msg name then the code works again as expected. I then tried in different docker container and noticed that the issue exists as back as Lunar release.

The cpp code is listed here (and can be found in the reproduction example):

#include "ros/ros.h"
#include "test_pkg/aabbccddeeffgghhiij.h"

int main(int argc, char **argv) {
  ros::init(argc, argv, "abcdefghijklmnopqrstuvw");
  ros::NodeHandle nh{};
  ros::Subscriber pub = nh.subscribe<test_pkg::aabbccddeeffgghhiij>(
      "/abcdefghijklmnopqrstuvwxyzabcdefgh", 1,
      [](const test_pkg::aabbccddeeffgghhiij::ConstPtr &msg) {
        ROS_INFO("Received msg with foo=%d", msg->foo);
      },
      ros::TransportHints().unreliable().reliable()); // must be UDP connection

  ROS_INFO("Waiting for msg....");
  ros::spin();
}

The publisher is:

rostopic pub /abcdefghijklmnopqrstuvwxyzabcdefgh test_pkg/aabbccddeeffgghhiij  -r 1 '{foo: 5}'

The callback is not triggered and roswtf shows:

ERROR The following nodes should be connected but aren't:
* /rostopic_477_1660895530813->/abcdefghijklmnopqrstuvw (/abcdefghijklmnopqrstuvwxyzabcd
efgh)

Now, changing the node name (in main.cpp or via rosrun test_pkg test_app __name:=xyz) or the topic or the msg name will resolve this issue. The problem doesn't seem to be related to the name itself but the total length of msg + topic + node. Just adding or removing one or more characters resolve the issue.

j3sq avatar Aug 19 '22 08:08 j3sq

Replying on my own issue to help others stumbling on this. The bug has nothing to do with UDP communication. The issue stems from a bug in the implementation of XmlRpcValue encoding of base64 values. This test can can be added to ros_comm/xmlrpcpp/test/xmlrpcvalue_base64.cpp to highlight the issue:

  TEST(xmlrpcvalue_base64, no_nulls)
  {
      bool success = true;
      for (int count = 0; count < 120; count++)
      {
          std::string data(count, 'a');
          XmlRpcValue value = base64Value(data);
          auto encoded_data = value.toXml();
          if (std::find(encoded_data.begin(), encoded_data.end(), '\0') != encoded_data.end())
          {
              printf("Failed for count = %d\n", count);
              success = false;
          }
      }
      ASSERT_TRUE(success);
  }

This test will fail for data with length that is multiple of 54 (72 chars in base64). The issue is found in the function XmlRpcValue::binaryToXml(). Specifically base64EncodeSize() underestimates the size by one when data length is multiple of 54 bytes. In the current implementation the base64::encoder::encode() will end the string with a new line and then bas64::encoder::encode_end() will add another new line (i.e. two consecutive '\n'). The encode_end write will be outside of xml string data and when resized again (with offset > capacity) whatever value was there will be now part of the xml string.

I can do a PR to fix the issue but I'm not sure what approach is the best among the three alternatives: 1- Fix the base64EncodeSize() function to consider this case 2- Update the base64::encoder::encode() so that it will not add a new line char if we're about to exit 3- Update the base64::encoder::encode_end() so that it doesn't add a '\n' if the last value in the encoded string was already a '\n'

j3sq avatar Oct 03 '22 15:10 j3sq