Fast-DDS icon indicating copy to clipboard operation
Fast-DDS copied to clipboard

[RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7415: open_and_lock_file failed -> Function open_port_internal [13645]

Open TSC21 opened this issue 3 years ago • 23 comments

The error above appears when launching a FastDDS 2.3.1 application generated from FastRTPSGen and also when using ROS2 nodes, with rmw_fastrtps_cpp as the RMW, in ROS2 Galactic with ROS_LOCALHOST_ONLY set. To note that the same behavior doesn't happen using ROS2 Foxy (i.e. FastDDS 2.0.2) in the same platform.

Expected Behavior

The following error should not appear and one should be able to use the Shared Memory transport.

Current Behavior

The error appears and one is unable to use the SharedMemory transport.

Steps to Reproduce

It seems to be platform specific, as I don't see this in my laptop. But the error description should provide enough context to what might be the problem and maybe you can provide a possible solution. Note that in the FastDDS app the following code (specific to whitelisting the localhost and use the Shared Memory transport) is run for both publishers and subscribers:

// Create a custom network UDPv4 transport descriptor
// to whitelist the localhost
auto localhostUdpTransport = std::make_shared<UDPv4TransportDescriptor>();
localhostUdpTransport->interfaceWhiteList.emplace_back("127.0.0.1");

// Disable the built-in Transport Layer
PParam.rtps.useBuiltinTransports = false;

// Add the descriptor as a custom user transport
PParam.rtps.userTransports.push_back(localhostUdpTransport);

// Add shared memory transport when available
auto shmTransport = std::make_shared<SharedMemTransportDescriptor>();
PParam.rtps.userTransports.push_back(shmTransport);

System information

The platform is an Ubuntu 20.04 container in an arm64/aarch64 SOM with Linux kernel based from 4.14.98.

  • Fast-RTPS version: 2.3.1
  • OS: Ubuntu 20.04
  • Network interfaces: lo (127.0.0.1)
  • ROS2: Galactic

Additional context

Additional resources

  • Wireshark capture: This is running on a SOM in the lo, so I am not entirely sure how I can capture there.
  • XML profiles file: N.A.

Thanks in advance for the help! @MiguelCompany @Dani-Cabezas

TSC21 avatar Jun 09 '21 06:06 TSC21

@TSC21 The only changes related to this are the ones on #1788, but they should only affect when __QNXNTO__ is defined during build.

These changes are also present on branch 2.0.x, could you check if you also have those failures with that branch ?

MiguelCompany avatar Jun 09 '21 07:06 MiguelCompany

As said in the description, I don't have these problems with ROS2 Foxy, which defaults to FastDDS 2.0.2.

TSC21 avatar Jun 09 '21 08:06 TSC21

Yeah, but 2.0.x has additional changes, including commit 81cda6ae802640b526d683b6ef98b38d3c02ad2f. This means that building Foxy from sources may also have the problem.

MiguelCompany avatar Jun 09 '21 08:06 MiguelCompany

Yeah, but 2.0.x has additional changes, including commit 81cda6a. This means that building Foxy from sources may also have the problem.

Well but I cannot at this point build the entire distro from source in the platform I am using. It would take forever and it's not an option for me at this point. Unless I can test this on my laptop. If yes, can you provide the steps here on how to use that branch with Foxy and build it from source? Thanks.

TSC21 avatar Jun 09 '21 08:06 TSC21

@TSC21 I'm just asking that you try with Fast DDS alone and branch 2.0.x, as you have done with v2.3.1. I'm asking this to check if commit 81cda6ae802640b526d683b6ef98b38d3c02ad2f is the responsible. In fact, it would be better if you could directly check with that commit directly.

MiguelCompany avatar Jun 09 '21 08:06 MiguelCompany

@TSC21 I'm just asking that you try with Fast DDS alone and branch 2.0.x, as you have done with v2.3.1. I'm asking this to check if commit 81cda6a is the responsible. In fact, it would be better if you could directly check with that commit directly.

I have not directly tested FastDDS from the repo. I just used the ones provided by ROS2 (Foxy - 2.0.2, Galactic - 2.3.1). Building and installing FastDDS from source on the platform just to test this with FastDDSGen is an option but I don't know if I can manage it easily.

TSC21 avatar Jun 09 '21 08:06 TSC21

I am getting the same error in the official Docker container of ROS2 Foxy distribution.

2021-06-09 10:56:04.734 [RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7435: open_and_lock_file failed -> Function open_port_internal

It appears when I try to publish a message on a topic within the docker container.

erdemuysalx avatar Jun 09 '21 11:06 erdemuysalx

I am getting the same error in the official Docker container of ROS2 Foxy distribution.

2021-06-09 10:56:04.734 [RTPS_TRANSPORT_SHM Error] Failed init_port fastrtps_port7435: open_and_lock_file failed -> Function open_port_internal

It appears when I try to publish a message on a topic within the docker container.

@erd3muysal thanks for the input. I am also using Docker containers on the platform. So it might be the case this is actually docker related. What I find awkward is that I am able to use ROS2 Foxy inside the containers on the platform without the above happening. But not with Galactic. Maybe I am not using the latest Foxy made available though in this case.

@erd3muysal are you able to reproduce the same in Galactic?

TSC21 avatar Jun 09 '21 11:06 TSC21

@TSC21 It is just disappeared a minute ago without any interference. But now it popped up again.

I have a little bit of weird configuration here; having two separate containers while the first one running ros:latest container, the other one is running Gazebo. I am trying to move the robot in Gazebo, by pushing messages to the relevant topic. But the error mentioned above appears.

No, I did not experience this on Galactic, actually, I did not even try on there.

erdemuysalx avatar Jun 09 '21 11:06 erdemuysalx

@TSC21 @erd3muysal Could this be related to #1755 ?

MiguelCompany avatar Jun 09 '21 12:06 MiguelCompany

@TSC21 @erd3muysal Could this be related to #1755 ?

I don't think so because I don't have this problem when using ROS2 Foxy and the packages built against it (including also the FastRTPSGen generated app). And both nodes and app run on the same container on my case.

TSC21 avatar Jun 09 '21 12:06 TSC21

@MiguelCompany Thank you for your reference to #1755. It seems like the error has been gone, but I am still not able to see published messages on the topic.

erdemuysalx avatar Jun 09 '21 14:06 erdemuysalx

I just want to confirm that I observed the exact same error message on our robot running ROS2 Foxy as well. It occurred only once, on a subsequent launch it did not appear. ROS_LOCALHOST_ONLY is not set.

sebhaug avatar Jun 14 '21 09:06 sebhaug

@MiguelCompany any progress in this?

TSC21 avatar Jun 24 '21 15:06 TSC21

I'm getting this same error when running code generated by FastDDS. No ROS involved in my case. Any thoughts?

eddiem3 avatar Jul 01 '21 13:07 eddiem3

I'm getting this same error when trying to communicate ROS2 foxy (using Fast-DDS 2.1.x) with other version of FastDDS, What I did exactly is I tried to reproduce this article https://gist.github.com/EduPonz/bea0edf3e1ac366560eff62cceb5ddf9. And found out that it only works before commit #1856 https://github.com/eProsima/Fast-DDS/commit/12c9f9ef0297329e93139f6366b84fc6a9a42c76, and gets error afterwards. Also, the integration service has the same problem. So, the problem can disappear if you use same version of Fast-DDS.

yushuhuang avatar Aug 19 '21 09:08 yushuhuang

I run in this error as well when communicating between the pre-installed ROS2 stack and my own application that links to a statically compiled version of FastDDS

Somehow with version 2.3.4, I tend to see it when using the server.

jespersmith avatar Aug 24 '21 18:08 jespersmith

Is there any update on this issue? This seems to be consistent on windows.

karthiknit1 avatar Jun 10 '22 10:06 karthiknit1

This error is shown when some shared memory files have not been correctly freed if the Fast DDS application has crashed or has not been closed cleanly. Fast DDS CLI provides an option to clean zombie files: fastdds shm clean. The issue is that if the file is still blocked because Fast DDS was closed unexpectedly then this tool cannot remove the file. Then, the only option is to remove these files manually. The shared memory files are saved in the following folders and are named with fastrtps included in their filenames:

  • Linux: /dev/shm/
  • MacOS: /private/tmp/boost_interprocess/
  • Windows: C:\programdata\eprosima\fastrtps_interprocess\

JLBuenoLopez avatar Jun 20 '22 10:06 JLBuenoLopez

This error is shown when some shared memory files have not been correctly freed if the Fast DDS application has crashed or has not been closed cleanly. Fast DDS CLI provides an option to clean zombie files: fastdds shm clean. The issue is that if the file is still blocked because Fast DDS was closed unexpectedly then this tool cannot remove the file. Then, the only option is to remove these files manually. The shared memory files are saved in the following folders and are named with fastrtps included in their filenames:

  • Linux: /dev/shm/
  • MacOS: /private/tmp/boost_interprocess/
  • Windows: C:\programdata\eprosima\fastrtps_interprocess\

@JLBuenoLopez-eProsima I encountered the same issue. I have multiple fast-dds processes running in my system, how do I know which filename is used by the currently crashed process. Can I manually specify a prefix of shm file for each process in order to delete crashed files accurately?

Regards,

duchengyao avatar Jul 07 '22 07:07 duchengyao

@duchengyao I would suggest running fastdds shm clean first. It will inform of the number of removed segments, as well as the ones still in use. For instance:

shm.clean:
4 ports in use
2 segments in use
2 zombie ports cleaned
1 zombie segments cleaned

If the reported number of cleaned ports and segments is 0, you could then try to remove all files on the shared folder. The operating system will only let you remove the ones created by the process that crashed, since the other ones will have an exclusive lock in place.

MiguelCompany avatar Jul 07 '22 07:07 MiguelCompany

If the reported number of cleaned ports and segments is 0, you could then try to remove all files on the shared folder. The operating system will only let you remove the ones created by the process that crashed, since the other ones will have an exclusive lock in place.

@MiguelCompany I have tried fastdds shm clean and remove all files.

  1. The file is unable to be removed using fastdds shm clean if the mutex is blocked.
  2. If I try removing all files in /dev/shm, new subscriber launched will unable to receive any messages then, although exist subscriber can received messages. And I found that all files in /dev/shm are indeed removed, until quit the publisher, the file reappeared.

My specific problem is in this issue. https://github.com/eProsima/Fast-DDS/issues/2811

Regards,

duchengyao avatar Jul 07 '22 07:07 duchengyao

@MiguelCompany I've noticed that no matter what file I delete, it doesn't solve my problem. Is there a way to release the mutex when the publisher detects that the subscriber is not active (or dead)?

duchengyao avatar Jul 07 '22 10:07 duchengyao

As said in the description, I don't have these problems with ROS2 Foxy, which defaults to FastDDS 2.0.2.

I've met this problem in ROS2 Foxy, which is fastdds2.0.3.

ZhenshengLee avatar Nov 18 '22 15:11 ZhenshengLee

I have created a ticket labelled as enhancement to improve the SHM Transport logged messages in order to make them more helpful to the user (#3578). I am going to close this issue as the cause for the log message has been explained. The other issue mentioned in the latest comments is being tracked in its own ticket (#2811). Finally, the Fast DDS version is no longer maintained.

JLBuenoLopez avatar Jun 09 '23 05:06 JLBuenoLopez