:farmer: connext test regressions in `test_subscription_topic_statistics`
Bug report
Required Info:
- Operating System:
- Linux Noble
- Installation type:
- Source
- Version or commit hash:
- Rolling, Jammy
- DDS implementation:
- RTI Connext
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
- Run a build in Rci__nightly-connext_ubuntu_noble_amd64
- See test regressions
Additional information
Reference build: Rci__nightly-connext_ubuntu_noble_amd64#94
Test regressions:
- projectroot.test.rclcpp.test_subscription_topic_statistics
- rclcpp.TestSubscriptionTopicStatisticsFixture.test_receive_stats_for_message_no_header rclcpp.TestSubscriptionTopicStatisticsFixture.test_receive_stats_include_window_reset
Log output:
[ RUN ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_for_message_no_header
RTI Connext DDS Non-commercial license is for academic, research, evaluation and personal use only. USE FOR COMMERCIAL PURPOSES IS PROHIBITED. See RTI_LICENSE.TXT for terms. Download free tools at rti.com/ncl. License issued to Non-Commercial User [email protected] For non-production use only.
Expires on 00-jan-00 See www.rti.com for more information.
/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:355: Failure
Expected equality of these values:
kNumExpectedMessageAgeMessages
Which is: 4
message_age_count
Which is: 0
/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:356: Failure
Expected equality of these values:
kNumExpectedMessagePeriodMessages
Which is: 4
message_period_count
Which is: 8
[ FAILED ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_for_message_no_header (8331 ms)
...
[ RUN ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_include_window_reset
RTI Connext DDS Non-commercial license is for academic, research, evaluation and personal use only. USE FOR COMMERCIAL PURPOSES IS PROHIBITED. See RTI_LICENSE.TXT for terms. Download free tools at rti.com/ncl. License issued to Non-Commercial User [email protected] For non-production use only.
Expires on 00-jan-00 See www.rti.com for more information.
/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:404: Failure
Expected: (stats_point.data) >= (message_age_offset), actual: 100.035641 vs 1000
/tmp/ws/src/ros2/rclcpp/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp:404: Failure
Expected: (stats_point.data) >= (message_age_offset), actual: 99.960970000000003 vs 1000
[ FAILED ] TestSubscriptionTopicStatisticsFixture.test_receive_stats_include_window_reset (8331 ms)
Failures are pointing to this source code: https://github.com/ros2/rclcpp/blob/rolling/rclcpp/test/rclcpp/topic_statistics/test_subscription_topic_statistics.cpp
this problem can be reproducible with current rolling today, as reported this only happens with https://github.com/ros2/rmw_connextdds.
source timestamp is assigned by RTI Connext to libstatistic allocator, and confirmed that topic statistics on the subscription publishes the data for both message_age and message_period. there seems to be no problem at this statistics publisher process.
https://github.com/ros2/rclcpp/blob/45adf6565f99042a5fdcad7b8830e69f5efb03e5/rclcpp/include/rclcpp/topic_statistics/subscription_topic_statistics.hpp#L143-L145
on the other hand, MetricsMessageCallback is only called with source name message_period.
https://github.com/ros2/rclcpp/blob/45adf6565f99042a5fdcad7b8830e69f5efb03e5/rclcpp/test/rclcpp/topic_statistics/test_topic_stats_utils.hpp#L132-L140
that fails the test since it expects the same number of messages for each source name.
it just looks like source name message_age metric message is never delivered to MetricsMessageSubscriber during this test.
i would like to hear from RTI developer for this behavior.
IMO, assumption that the same exact number of messages are delivered to each source name would be bad, since there is no guarantee that multiple source name messages come in order, that i think depends on the RMW implementation.
i would like to hear from RTI developer for this behavior.
@asorbini any chance that you or someone else from the RTI team may be able to shed some light here?
@clalancette You mentioned to assign this issue to you and add label more information needed.
@asorbini A friendly ping for comments about this issue.
@trubio-rti CC: @asorbini
this comes up with priority in this week's PMC meeting. according to https://github.com/ros2/rclcpp/issues/2588#issuecomment-2259355642, it just cannot receive the data from RTI connextdds from rmw perspective. could you taka a look at this why we are missing the data? (this only happens with RTI connextdds, but cyclone nor fastdds.)
@fujitatomoya: @fgallegosalido is the one working now on the Connext RMW
@trubio-rti thanks! @fgallegosalido let us know if you need any help.
@fujitatomoya we are trying to reproduce the issue. Could you clarify which Connext version is being used or is it just the one in the CI? Thanks.
@fgallegosalido that is 6.0.1.25-1.
I was able to reproduce it. Working on it
After a long investigation, we've found out that the issue was produced by the KEEP_LAST QoS on the topic statistics publisher. Basically, as types are unkeyed in ROS2, when two samples were sent so close to each other, the second sample would arrive to the publisher queue before the first one could be sent, therefore discarding the first one.
Changing the QoS in the test to use KEEP_ALL for the statistics publisher (in the class SubscriberWithTopicStatistics), fixes both tests (haven't had a failure in 100 runs). Basically change the constructor of the class SubscriberWithTopicStatistics by setting the history QoS to keep_all:
auto options = rclcpp::SubscriptionOptions();
options.topic_stats_options.state = rclcpp::TopicStatisticsState::Enable;
options.topic_stats_options.publish_period = publish_period;
options.topic_stats_options.qos.keep_all();
@fgallegosalido thanks for the analysis, i will try and confirm it right now.
@fgallegosalido i would like to confirm the connextdds behavior, can you check the comments in https://github.com/ros2/rclcpp/pull/2650?
@Crola1702 do we need to backport https://github.com/ros2/rclcpp/pull/2650 to downstream distributions?
@Crola1702 do we need to backport https://github.com/ros2/rclcpp/pull/2650 to downstream distributions?
Yes that's right
@fujitatomoya I opened https://github.com/ros2/rclcpp/pull/2657