rmw Drop of messages with History "keep all" but not with "keep last"

I am creating a publisher and subscriber with QoS History "Keep all" and Durability "transient local".

If publisher publishes more than 5000 and messages, then I run the subscriber, to start listen, it will only capture last 5k messages and not all messages.

Now if I change the History to "keep last" and depth 50000, and if I do the same thing (starting subscriber after publishing 5000 messages), subscriber listen all the messages.

I want to know what the maximum queue depth allowed in fast rtps which can be adhered.

Publisher code: #include #include #include #include

#include "rclcpp/rclcpp.hpp" #include "std_msgs/msg/int32.hpp"

using namespace std::chrono_literals;

class MinimalPublisher : public rclcpp::Node { public: MinimalPublisher() : Node("minimal_publisher"), count_(0) { // Set up QoS with history keep all, reliability reliable, and durability transient local auto qos = rclcpp::QoS(rclcpp::KeepLast(50000)) .reliable() .transient_local();

    publisher_ = this->create_publisher<std_msgs::msg::Int32>("topic", qos);
    timer_ = this->create_wall_timer(
        200ms, std::bind(&MinimalPublisher::timer_callback, this)); // Changed to 200ms
}

private: void timer_callback() { if (count_ >= 7000) { rclcpp::shutdown(); // Stop publishing after 7000 messages return; }

    auto message = std_msgs::msg::Int32();
    message.data = count_;
    RCLCPP_INFO(this->get_logger(), "Publishing: '%d'", message.data);
    publisher_->publish(message);
    count_++;
}

rclcpp::TimerBase::SharedPtr timer_;
rclcpp::Publisher<std_msgs::msg::Int32>::SharedPtr publisher_;
size_t count_;

};

int main(int argc, char * argv[]) { rclcpp::init(argc, argv); rclcpp::spin(std::make_shared<MinimalPublisher>()); rclcpp::shutdown(); return 0; }

Subscriber code: #include

#include "rclcpp/rclcpp.hpp" #include "std_msgs/msg/int32.hpp" using std::placeholders::_1;

class MinimalSubscriber : public rclcpp::Node { public: MinimalSubscriber() : Node("minimal_subscriber") { // Set up QoS with history keep all, reliability reliable, and durability transient local auto qos = rclcpp::QoS(rclcpp::KeepLast(50000)) .reliable() .transient_local();

    subscription_ = this->create_subscription<std_msgs::msg::Int32>(
        "/topic", qos, std::bind(&MinimalSubscriber::topic_callback, this, _1));
}

private: void topic_callback(const std_msgs::msg::Int32::SharedPtr msg) const { RCLCPP_INFO(this->get_logger(), "I heard: '%d'", msg->data); }

rclcpp::Subscription<std_msgs::msg::Int32>::SharedPtr subscription_;

};

int main(int argc, char * argv[]) { rclcpp::init(argc, argv); rclcpp::spin(std::make_shared<MinimalSubscriber>()); rclcpp::shutdown(); return 0; }

please change the QoS values in code as needed.

Dec 19 '24 06:12 theksg

@theksg

as far as I know, this is designed behavior.

there is ResourceLimitsQosPolicy(https://fast-dds.docs.eprosima.com/en/v3.0.0/fastdds/api_reference/dds_pim/core/policy/resourcelimitsqospolicy.html) defined in DDS specification that determines the limits.

int32_t max_samples Value less or equal to 0 means infinite resources. By default, 5000.

if you want to increase the limits,

specify the limits with KEEP_LAST, that you already confirmed.
apply xml configuration policy to change max_samples.

hopefully this helps.

Dec 19 '24 19:12 fujitatomoya

@theksg

as far as I know, this is designed behavior.

there is ResourceLimitsQosPolicy(https://fast-dds.docs.eprosima.com/en/v3.0.0/fastdds/api_reference/dds_pim/core/policy/resourcelimitsqospolicy.html) defined in DDS specification that determines the limits.

int32_t max_samples Value less or equal to 0 means infinite resources. By default, 5000.

if you want to increase the limits,

specify the limits with KEEP_LAST, that you already confirmed.

apply xml configuration policy to change max_samples.

hopefully this helps.

@fujitatomoya thanks for reverting. if the resource exhaustion happens, that will lead to crashing of node. Is that the expected behavior?

Dec 20 '24 05:12 theksg

if the resource exhaustion happens, that will lead to crashing of node. Is that the expected behavior?

i am not quite following this, can you rephrase the question a bit?

Dec 20 '24 07:12 fujitatomoya

i am not quite following this, can you rephrase the question a bit?

With a publisher using QoS depth 50,000 and transient local durability, sending 10 MB images can lead to a large queue. If memory is overwhelmed, the node might crash. Is this expected, and should developers prevent such crashes? or there is some mechanism to prevent such failure?

Dec 20 '24 10:12 theksg

i guess that should be detected at the configuration if you are using POD data before runtime, because it tries to configure the buffer with the size which is much bigger than system originally has? if that is the variable length data, DataWriter::write() call may fail and throw exception? honestly i never tried that, and maybe dependent on rmw implementation.

CC: @eboasson @MiguelCompany @trubio-rti

Dec 20 '24 17:12 fujitatomoya

With a publisher using QoS depth 50,000 and transient local durability, sending 10 MB images can lead to a large queue. If memory is overwhelmed, the node might crash. Is this expected, and should developers prevent such crashes? or there is some mechanism to prevent such failure?

So while we should do our best to not crash, I'm also curious what the use case is here. What are you trying to achieve with such a large depth of large images?

Dec 20 '24 17:12 clalancette

So while we should do our best to not crash, I'm also curious what the use case is here. What are you trying to achieve with such a large depth of large images?

The scenario I am talkin about is hypothetical and basically only intended to identify a case where a crash will happen. From the use case's perspective, I am writing a publisher with transient local durability and KeepAll history. I wondered if KeepAll can make node crash, so I was considering the limitations it may have which I need to consider.

I understand that default max QoS Depth for KeepAll QoS History is 5000 for fast dds and 10000 for cyclone dds. (Although hypothetical message size can even exhaust them, XD).

Then I tried KeepLast and was amazed to see their capacity can be more than what KeepAll can store, because from the definitions, KeepAll should have more capacity.

I wanted to know if a crash due to overwhelming of message queue can be identified before or not, or we relying on developer to create the message queue so as to avoid crashing.

Dec 21 '24 05:12 theksg

rmw rmw copied to clipboard

Drop of messages with History "keep all" but not with "keep last"

rmw
rmw copied to clipboard