Fast-DDS
Fast-DDS copied to clipboard
[RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7415: open_and_lock_file failed -> Function open_port_internal [13645]
The error above appears when launching a FastDDS 2.3.1 application generated from FastRTPSGen and also when using ROS2 nodes, with rmw_fastrtps_cpp
as the RMW, in ROS2 Galactic with ROS_LOCALHOST_ONLY
set. To note that the same behavior doesn't happen using ROS2 Foxy (i.e. FastDDS 2.0.2) in the same platform.
Expected Behavior
The following error should not appear and one should be able to use the Shared Memory transport.
Current Behavior
The error appears and one is unable to use the SharedMemory transport.
Steps to Reproduce
It seems to be platform specific, as I don't see this in my laptop. But the error description should provide enough context to what might be the problem and maybe you can provide a possible solution. Note that in the FastDDS app the following code (specific to whitelisting the localhost and use the Shared Memory transport) is run for both publishers and subscribers:
// Create a custom network UDPv4 transport descriptor
// to whitelist the localhost
auto localhostUdpTransport = std::make_shared<UDPv4TransportDescriptor>();
localhostUdpTransport->interfaceWhiteList.emplace_back("127.0.0.1");
// Disable the built-in Transport Layer
PParam.rtps.useBuiltinTransports = false;
// Add the descriptor as a custom user transport
PParam.rtps.userTransports.push_back(localhostUdpTransport);
// Add shared memory transport when available
auto shmTransport = std::make_shared<SharedMemTransportDescriptor>();
PParam.rtps.userTransports.push_back(shmTransport);
System information
The platform is an Ubuntu 20.04 container in an arm64
/aarch64
SOM with Linux kernel based from 4.14.98.
- Fast-RTPS version: 2.3.1
- OS: Ubuntu 20.04
- Network interfaces: lo (127.0.0.1)
- ROS2: Galactic
Additional context
Additional resources
- Wireshark capture: This is running on a SOM in the lo, so I am not entirely sure how I can capture there.
- XML profiles file: N.A.
Thanks in advance for the help! @MiguelCompany @Dani-Cabezas
@TSC21 The only changes related to this are the ones on #1788, but they should only affect when __QNXNTO__
is defined during build.
These changes are also present on branch 2.0.x
, could you check if you also have those failures with that branch ?
As said in the description, I don't have these problems with ROS2 Foxy, which defaults to FastDDS 2.0.2.
Yeah, but 2.0.x
has additional changes, including commit 81cda6ae802640b526d683b6ef98b38d3c02ad2f. This means that building Foxy from sources may also have the problem.
Yeah, but
2.0.x
has additional changes, including commit 81cda6a. This means that building Foxy from sources may also have the problem.
Well but I cannot at this point build the entire distro from source in the platform I am using. It would take forever and it's not an option for me at this point. Unless I can test this on my laptop. If yes, can you provide the steps here on how to use that branch with Foxy and build it from source? Thanks.
@TSC21 I'm just asking that you try with Fast DDS alone and branch 2.0.x
, as you have done with v2.3.1
. I'm asking this to check if commit 81cda6ae802640b526d683b6ef98b38d3c02ad2f is the responsible. In fact, it would be better if you could directly check with that commit directly.
@TSC21 I'm just asking that you try with Fast DDS alone and branch
2.0.x
, as you have done withv2.3.1
. I'm asking this to check if commit 81cda6a is the responsible. In fact, it would be better if you could directly check with that commit directly.
I have not directly tested FastDDS from the repo. I just used the ones provided by ROS2 (Foxy - 2.0.2, Galactic - 2.3.1). Building and installing FastDDS from source on the platform just to test this with FastDDSGen is an option but I don't know if I can manage it easily.
I am getting the same error in the official Docker container of ROS2 Foxy distribution.
2021-06-09 10:56:04.734 [RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7435: open_and_lock_file failed -> Function open_port_internal
It appears when I try to publish a message on a topic within the docker container.
I am getting the same error in the official Docker container of ROS2 Foxy distribution.
2021-06-09 10:56:04.734 [RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7435: open_and_lock_file failed -> Function open_port_internal
It appears when I try to publish a message on a topic within the docker container.
@erd3muysal thanks for the input. I am also using Docker containers on the platform. So it might be the case this is actually docker related. What I find awkward is that I am able to use ROS2 Foxy inside the containers on the platform without the above happening. But not with Galactic. Maybe I am not using the latest Foxy made available though in this case.
@erd3muysal are you able to reproduce the same in Galactic?
@TSC21 It is just disappeared a minute ago without any interference. But now it popped up again.
I have a little bit of weird configuration here; having two separate containers while the first one running ros:latest container, the other one is running Gazebo. I am trying to move the robot in Gazebo, by pushing messages to the relevant topic. But the error mentioned above appears.
No, I did not experience this on Galactic, actually, I did not even try on there.
@TSC21 @erd3muysal Could this be related to #1755 ?
@TSC21 @erd3muysal Could this be related to #1755 ?
I don't think so because I don't have this problem when using ROS2 Foxy and the packages built against it (including also the FastRTPSGen generated app). And both nodes and app run on the same container on my case.
@MiguelCompany Thank you for your reference to #1755. It seems like the error has been gone, but I am still not able to see published messages on the topic.
I just want to confirm that I observed the exact same error message on our robot running ROS2 Foxy as well. It occurred only once, on a subsequent launch it did not appear. ROS_LOCALHOST_ONLY
is not set.
@MiguelCompany any progress in this?
I'm getting this same error when running code generated by FastDDS. No ROS involved in my case. Any thoughts?
I'm getting this same error when trying to communicate ROS2 foxy (using Fast-DDS 2.1.x) with other version of FastDDS, What I did exactly is I tried to reproduce this article https://gist.github.com/EduPonz/bea0edf3e1ac366560eff62cceb5ddf9. And found out that it only works before commit #1856 https://github.com/eProsima/Fast-DDS/commit/12c9f9ef0297329e93139f6366b84fc6a9a42c76, and gets error afterwards. Also, the integration service has the same problem. So, the problem can disappear if you use same version of Fast-DDS.
I run in this error as well when communicating between the pre-installed ROS2 stack and my own application that links to a statically compiled version of FastDDS
Somehow with version 2.3.4, I tend to see it when using the server.
Is there any update on this issue? This seems to be consistent on windows.
This error is shown when some shared memory files have not been correctly freed if the Fast DDS application has crashed or has not been closed cleanly. Fast DDS CLI provides an option to clean zombie files: fastdds shm clean
. The issue is that if the file is still blocked because Fast DDS was closed unexpectedly then this tool cannot remove the file. Then, the only option is to remove these files manually. The shared memory files are saved in the following folders and are named with fastrtps
included in their filenames:
- Linux:
/dev/shm/
- MacOS:
/private/tmp/boost_interprocess/
- Windows:
C:\programdata\eprosima\fastrtps_interprocess\
This error is shown when some shared memory files have not been correctly freed if the Fast DDS application has crashed or has not been closed cleanly. Fast DDS CLI provides an option to clean zombie files:
fastdds shm clean
. The issue is that if the file is still blocked because Fast DDS was closed unexpectedly then this tool cannot remove the file. Then, the only option is to remove these files manually. The shared memory files are saved in the following folders and are named withfastrtps
included in their filenames:
- Linux:
/dev/shm/
- MacOS:
/private/tmp/boost_interprocess/
- Windows:
C:\programdata\eprosima\fastrtps_interprocess\
@JLBuenoLopez-eProsima I encountered the same issue. I have multiple fast-dds processes running in my system, how do I know which filename is used by the currently crashed process. Can I manually specify a prefix of shm file for each process in order to delete crashed files accurately?
Regards,
@duchengyao I would suggest running fastdds shm clean
first. It will inform of the number of removed segments, as well as the ones still in use. For instance:
shm.clean:
4 ports in use
2 segments in use
2 zombie ports cleaned
1 zombie segments cleaned
If the reported number of cleaned ports and segments is 0, you could then try to remove all files on the shared folder. The operating system will only let you remove the ones created by the process that crashed, since the other ones will have an exclusive lock in place.
If the reported number of cleaned ports and segments is 0, you could then try to remove all files on the shared folder. The operating system will only let you remove the ones created by the process that crashed, since the other ones will have an exclusive lock in place.
@MiguelCompany
I have tried fastdds shm clean
and remove all files.
- The file is unable to be removed using
fastdds shm clean
if the mutex is blocked. - If I try removing all files in
/dev/shm
, new subscriber launched will unable to receive any messages then, although exist subscriber can received messages. And I found that all files in/dev/shm
are indeed removed, until quit the publisher, the file reappeared.
My specific problem is in this issue. https://github.com/eProsima/Fast-DDS/issues/2811
Regards,
@MiguelCompany I've noticed that no matter what file I delete, it doesn't solve my problem. Is there a way to release the mutex when the publisher detects that the subscriber is not active (or dead)?
As said in the description, I don't have these problems with ROS2 Foxy, which defaults to FastDDS 2.0.2.
I've met this problem in ROS2 Foxy, which is fastdds2.0.3.
I have created a ticket labelled as enhancement to improve the SHM Transport logged messages in order to make them more helpful to the user (#3578). I am going to close this issue as the cause for the log message has been explained. The other issue mentioned in the latest comments is being tracked in its own ticket (#2811). Finally, the Fast DDS version is no longer maintained.