rosidl icon indicating copy to clipboard operation
rosidl copied to clipboard

Segfault if there is an idl mismatch between a publisher and subscriber

Open rupertholman opened this issue 6 years ago • 4 comments

Bug report

Required Info:

  • Operating System:
    • Ubuntu 18.04
  • Installation type:
    • From source
  • Version or commit hash:
    • dashing-release (00b4b50be091a2007e8be2560304bbb64ec8527c)
  • DDS implementation:
    • rmw_coredx_cpp
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

This occurs if there is an idl mismatch between a publisher and subscriber on a topic. For a simple example if a publisher publishes a message with one string field, but the subscriber is expecting two string fields, then the subscriber will segfault.

Expected behavior

The subscriber would receive the message with the additional string fields as blank strings.

Actual behavior

Segfault

Additional information

I think this is due to strlen being called on a null string in the function rosidl_generator_c__String__assign defined in rosidl/rosidl_generator_c/src/string_functions.c

The coredx middleware seems to handle the missing field in the message by leaving it null in the dds type. This causes a problem when the null value is passed to rosidl_generator_c__String__assign

rupertholman avatar Dec 18 '19 06:12 rupertholman

Can you add an example of how to reproduce the issue? How is this related with the linked #44?

ivanpauno avatar Dec 18 '19 18:12 ivanpauno

Apologies, not sure how that linked bug got in there. I've removed it. I will construct and attach some example message files.

rupertholman avatar Dec 18 '19 23:12 rupertholman

Here is a simple set of workspaces with two message versions and simple pub sub scripts. I can't distribute the coredx middleware, which is necessary to reproduce the bug, as the fastrtps and opensplice dds middlewares will not deserialise the messages/

bug_report.zip

Run build.sh to build the docker image. docker create network bug_report

In two separate windows run the docker images

#This runs a subscriber expecting version 2 of the message docker run --rm --name=sub --network=bug_report -ti bug_report /wk/v2messagepython /wk/pub_sub/src/example_subscriber.py

#This will run a publisher publishing version 1 of the message docker run --rm --name=pub --network=bug_report -ti bug_report /wk/v1messagepython /wk/pub_sub/src/example_publisher.py

The subscriber window should print the message below, but will segfault. (Only if using a dds implementation that can handle the missing fields)

Example message version:1 header:This is v1 header 1 field

rupertholman avatar Dec 19 '19 06:12 rupertholman

@rupertholman Thanks for the example.

My first question is, why a publisher and a subscriber using different topic types are communicating? Is that allowed at all by the DDS standard?

I think this is due to strlen being called on a null string in the function rosidl_generator_c__String__assign defined in rosidl/rosidl_generator_c/src/string_functions.c

I have debugged with gdb, and the error came directly from FastCDR (the example you shared is using fastrtps, right?).

Backtrace
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
  #1  0x00007ffff7a24801 in __GI_abort () at abort.c:79
  #2  0x00007ffff4181957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  #3  0x00007ffff4187ab6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  #4  0x00007ffff4187af1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  #5  0x00007ffff4187d24 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  #6  0x00007ffff180296f in eprosima::fastcdr::Cdr::deserialize(int&) () from /opt/ros/dashing/lib/libfastcdr.so.1
  #7  0x00007ffff18035f7 in eprosima::fastcdr::Cdr::readString(unsigned int&) () from /opt/ros/dashing/lib/libfastcdr.so.1
  #8  0x00007fffdf9f8c42 in eprosima::fastcdr::Cdr::deserialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
    from /wk/v2/example_interfaces/install/example_interfaces/lib/libexample_interfaces__rosidl_typesupport_fastrtps_c.so
  #9  0x00007fffdf9f8bdf in eprosima::fastcdr::Cdr::operator>>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
    from /wk/v2/example_interfaces/install/example_interfaces/lib/libexample_interfaces__rosidl_typesupport_fastrtps_c.so
  #10 0x00007fffdf9f87fe in _Example__cdr_deserialize () from /wk/v2/example_interfaces/install/example_interfaces/lib/libexample_interfaces__rosidl_typesupport_fastrtps_c.so
  #11 0x00007ffff280f69c in ?? () from /opt/ros/dashing/lib/librmw_fastrtps_cpp.so
  #12 0x00007ffff21d8632 in rmw_fastrtps_shared_cpp::TypeSupport::deserialize(eprosima::fastrtps::rtps::SerializedPayload_t*, void*) () from /opt/ros/dashing/lib/librmw_fastrtps_shared_cpp.so
  #13 0x00007ffff1c2a0f9 in eprosima::fastrtps::SubscriberHistory::deserialize_change(eprosima::fastrtps::rtps::CacheChange_t*, unsigned short, void*, eprosima::fastrtps::SampleInfo_t*) ()
    from /opt/ros/dashing/lib/libfastrtps.so.1
  #14 0x00007ffff1c2f9a0 in eprosima::fastrtps::SubscriberHistory::takeNextData(void*, eprosima::fastrtps::SampleInfo_t*) () from /opt/ros/dashing/lib/libfastrtps.so.1
  #15 0x00007ffff21d5913 in rmw_fastrtps_shared_cpp::_take(char const*, rmw_subscription_t const*, void*, bool*, rmw_message_info_t*, rmw_subscription_allocation_t*) ()
    from /opt/ros/dashing/lib/librmw_fastrtps_shared_cpp.so
  #16 0x00007ffff5889d18 in rcl_take () from /opt/ros/dashing/lib/librcl.so

I can't distribute the coredx middleware, which is necessary to reproduce the bug, as the fastrtps and opensplice dds middlewares will not deserialise the messages.

It seems that fastrtps is also trying to deserializing the message, and crashing .... Is that supposed to happen? i.e.: Is it allowed to register two DDS types with the same name, but different message definitions? And, are publisher/subscribers using those different messages supposed to communicate? Can you also provide a traceback of what the error is when using coredx?

ivanpauno avatar Dec 19 '19 13:12 ivanpauno