Fast-DDS
Fast-DDS copied to clipboard
When these processes start at the same time, many dropped parckets were generated by the 127.0.0.1 network
Is there an already existing issue for this?
- [X] I have searched the existing issues
Expected behavior
- There are 20 processes and a total of 130 topics running on the same machine
- QOS:both UDP and SHM are enabled;
udp_transport->interfaceWhiteList.push_back(127.0.0.1);
This means that discovery traffic uses a 127.0.0.1 for udp communication and user data uses shm communication. -
When these processes start at the same time,we expect no packet loss on the 127.0.0.1 that can be seen by the
ifconfig lo
Current behavior
When these processes start at the same time,There are many packet loss on the 127.0.0.1 that can be seen by the ifconfig lo
We have tried many ways, but nothing has worked:
-
Increase the buffer sizes of network adapters
sudo sysctl -w net.core.wmem_max=209715200 //200M
sudo sysctl -w net.core.rmem_max=209715200 //200M
-
Increase the socket buffer size in the QOS
"send_socket_buffer_size": 209715200, //200M
"listen_socket_buffer_size": 209715200
-
Increase the txqueuelen length
ip link set txqueuelen 10000 dev lo
Can you help me solve this problem?
Steps to reproduce
above
Fast DDS version/commit
v2.12.0
Platform/Architecture
Ubuntu Focal 20.04 arm64
Transport layer
Default configuration, UDPv4 & SHM
Additional context
No response
XML configuration file
No response
Relevant log output
No response
Network traffic capture
No response
Hi @TechVortexZ, thanks for using Fast DDS. You might consider that 20 processes and 130 topics are enough to make the network very busy so the loss can be related to this. If the loss is mostly in the discovery phase, you can try changing the initial announcement period: decreasing it will allow participants to be discovered more quickly, while increasing it will reduce the frequency of sending metatraffic packages, leading to a less busy network. Please let us know if you can get better performance with one of these solutions. Also, please note that version 2.12.x is end of life, so you may want to consider upgrading to our latest version 2.14.x.
Hi @TechVortexZ, thanks for using Fast DDS. You might consider that 20 processes and 130 topics are enough to make the network very busy so the loss can be related to this. If the loss is mostly in the discovery phase, you can try changing the initial announcement period: increasing it will allow participants to be discovered more quickly, while decreasing it will reduce the frequency of sending metatraffic packages, leading to a less busy network. Please let us know if you can get better performance with one of these solutions. Also, please note that version 2.12.x is end of life, so you may want to consider upgrading to our latest version 2.14.x.
Hi @elianalf, We decrease initial announcement period
"initial_announce_count": 5,
"initial_announce_period": 100ms,
But there are still lost packets.
When we modify this configuration "avoid_builtin_multicast": false,
, there are no lost packets. Can you tell me the function of this parameter, why to solve this problem.
However,I noticed that the pdp message interval is not 100ms when start, I set initial_announce_period": 100ms
,This is why?
Hi,
When we modify this configuration "avoid_builtin_multicast": false,, there are no lost packets. Can you tell me the function of this parameter, why to solve this problem.
The avoid_builtin_multicast=false
setting enables the use of multicast also during Endpoints Discovery Phase (EDP). It reduces the number of sent packages during EDP because each multicast data is sent at the same time to all participants, thereby reducing the traffic.
You could also try re-enabling it by avoiding_builtin_multicast=true
and setting the TTL parameter in UDPv4TransportDescriptor to 0. This way you will be sure that your traffic is local. In order to do that, you will also need to set use_builtin_transports=false
and add a SharedMemTransportDescriptor
and a UDPv4TransportDescriptor
to user transport.
DomainParticipantQos participant_qos;
participant_qos.transport().use_builtin_transports = false;
auto shm_transport = std::make_shared<SharedMemTransportDescriptor>();
participant_qos.transport().user_transports.push_back(shm_transport);
auto udp_transport = std::make_shared<UDPv4TransportDescriptor>();
udp_transport->TTL = 0;
participant_qos.transport().user_transports.push_back(udp_transport);
However,I noticed that the pdp message interval is not 100ms when start, I set initial_announce_period": 100ms,This is why?
I would need more information about the screenshot. From the information I have, I can tell you that initial_announce_period
set the specific period for each participant, maybe the timestamps that you are looking at are from different participants, so the difference is not 100ms.
Hi @elianalf [thanks for your reply.
I set avoiding_builtin_multicast=true
and set udp_transport->TTL = 0;
,also enable udp and shm.
As you provided the reference code, there are still lost packets.
I would need more information about the screenshot. From the information I have, I can tell you that initial_announce_period set the specific period for each participant, maybe the timestamps that you are looking at are from different participants, so the difference is not 100ms.
Here are more screenshots to illustrate the pdp message sent by the same particpant.
Hi,
I set avoiding_builtin_multicast=true and set udp_transport->TTL = 0;,also enable udp and shm. As you provided the reference code, there are still lost packets.
If your application requires to work only in local host and you obtain better performance setting avoid_builtin_multicast=false
, then that is a possible solution. That variable is set to true by default because disabling multicast during EDP on big network can be more secure.
Here are more screenshots to illustrate the pdp message sent by the same particpant.
All these packages are not only initialAnnouncements packages. Each participant sends an initialAnnouncements package every initial_announce_period
, but every time it discovers a participant it begins sending Data(p) packages to each multicast locator and to all known participants unicast locators. So between two initialAnnouncements packages, there might be many other Data(p). That is why the frequency of the packages you highlight is higher.
but every time it discovers a participant it begins sending Data(p) packages to each multicast locator and to all known participants unicast locators. So between two initialAnnouncements packages, there might be many other Data(p). That is why the frequency of the packages you highlight is higher.
Hi @elianalf thanks for your reply. Your answer above is right.
I want to ask the last question. I found an article on the fastdds website: https://www.eprosima.com/index.php/resources-all/scalability/fast-rtps-discovery-mechanisms-analysis. One of the conclusions in this article is that SDP causes network congestion:
Because of all the previous, it is concluded that the SDP produces network congestion in those cases where a high number of participants are involved in the communication. This leads to a higher packet loss and therefore to a reduction of the overall performance. The protocol implementation is open to optimizations, such as eliminating the duplicate announcements when new participants are discovered (which could lead to a PDP traffic reduction of around 28%), or limiting the announcement reply to a discovered participant to just that new participant (which could cut another 25% of the traffic in the testing scenarios).
It says that fastdds will provide optimization measures to reduce duplicate announcements, What are these optimization measures?
Hi, The article refers to Discover Server Mechanism. For any other information, I would recommend you to refer to the Documentation and not to the website because it is more detailed and constantly updated.