Fast-DDS icon indicating copy to clipboard operation
Fast-DDS copied to clipboard

Shared mem partition fills up after many runs: how to garbage collect?

Open Aposhian opened this issue 3 years ago • 10 comments
trafficstars

Is there an already existing issue for this?

  • [X] I have searched the existing issues

Expected behavior

I should be able to turn my system on and off many times with it operating the same every time.

Current behavior

After bringing up the system a number of times, my /dev/shm tmpfs partition fills up, no matter how big it is. Currently, mine is 32GB and it still fills up.

Steps to reproduce

Restart a participant many times in a loop until it can no longer create new shared mem segments.

Fast DDS version/commit

2.6.0-3jammy.20220520.002055

Platform/Architecture

Ubuntu Focal 20.04 amd64

Transport layer

Default configuration, UDPv4 & SHM

Additional context

When I run FastDDS participants many times, my /dev/shm partition fills up with segments until no more can be created. This is with /dev/shm being 32 GB in size. Does FastDDS not have any garbage collection capabilities, or does it just rely on machine restart?

This is running in containers with ipc: host and network: host

XML configuration file

No response

Relevant log output

[RTPS_TRANSPORT_SHM Error] Failed to create segment fastrtps_796f9c10effb4cf1: No such file or directory -> Function Segment
[RTPS_MSG_OUT Error] No such file or directory -> Function init
[RTPS_PARTICIPANT Error] Unable to Register SHM Transport. SHM Transport is not supported in the current platform.

Network traffic capture

No response

Aposhian avatar Jun 29 '22 18:06 Aposhian

Hi @Aposhian,

The shared memory files are cleaned up if the application using Fast DDS exits cleanly using the corresponding API to delete the DDS DomainParticipant. If your application is crashing or you are closing it without calling DomainParticipantFactory::delete_participant then the files are going to be kept. Nevertheless, Fast DDS tries to reuse the shared memory files from previous runs. However, as the application has not exited cleanly, some of these files can be still marked as blocked (you may find this comment helpful).

JLBuenoLopez avatar Jun 30 '22 05:06 JLBuenoLopez

I am using this from ROS2. I am typically stopping applications with SIGINT. Does rmw_fastrtps make a call to DomainParticipantFactory::delete_participant in that event?

Aposhian avatar Jun 30 '22 18:06 Aposhian

Is it safe to run fastdds shm clean while nodes are running?

Aposhian avatar Jun 30 '22 21:06 Aposhian

Is it safe to run fastdds shm clean while nodes are running?

It is safe, the result looks like this:

root@csc:/work/ros2_ws# fastdds shm clean
shm.clean:
4 ports in use
2 segments in use
0 zombie ports cleaned
0 zombie segments cleaned
root@csc:/work/ros2_ws# ls /dev/shm/
fast_datasharing_01.0f.3d.85.a9.09.eb.fa.01.00.00.00_0.0.12.3  fastrtps_port7412     fastrtps_port7415
fast_datasharing_01.0f.3d.85.b6.09.19.4d.01.00.00.00_0.0.12.4  fastrtps_port7412_el  fastrtps_port7415_el
fastrtps_1820ff8c0c530a42                                      fastrtps_port7413     sem.fastrtps_port7412_mutex
fastrtps_1820ff8c0c530a42_el                                   fastrtps_port7413_el  sem.fastrtps_port7413_mutex
fastrtps_b7e2475640426c8c                                      fastrtps_port7414     sem.fastrtps_port7414_mutex
fastrtps_b7e2475640426c8c_el                                   fastrtps_port7414_el  sem.fastrtps_port7415_mutex

llapx avatar Jul 06 '22 07:07 llapx

@Aposhian

After bringing up the system a number of times...

fastdds shm clean is a python script, you can call it when bring the system down to insure the template files are cleaned up.

llapx avatar Jul 07 '22 05:07 llapx

@JLBuenoLopez-eProsima

Is it possible to add a feature for auto cleanup broken shared files for fastdds? I think this maybe helpful for user.

llapx avatar Jul 11 '22 01:07 llapx

Hi @llapx,

It is not in Fast DDS roadmap to include such feature. If you are interested you can open a ticket in the corresponding forum and see if there is enough community support. Also, you may be interested in contacting Fast DDS support team for commercial support.

JLBuenoLopez avatar Jul 11 '22 07:07 JLBuenoLopez

I am using this from ROS2. I am typically stopping applications with SIGINT. Does rmw_fastrtps make a call to DomainParticipantFactory::delete_participant in that event?

@Aposhian, rmw_fastrtps_shared_cpp::destroy_participant handles the DomainParticipant destruction cleanly. I do not know how the ROS 2 stack signal handling works and if this method is called in case of SIGINT. I suppose that question should be asked in some other place.

I think this issue have been answered and can be closed. Would you mind doing it, @Aposhian? Otherwise, let me know why you consider the issue should be kept open. Maybe we should consider moving to the Q&A forum where according to Fast DDS contributing guidelines, questions should be kept.

JLBuenoLopez avatar Jul 11 '22 07:07 JLBuenoLopez

I can see there is a way forward by using fastdds shm clean, but I still think this presents a bad user experience. For someone who is using FastDDS with the default config, they may not even exactly know that shared memory is being used, or how shared memory is being used. They use the system for a while, and it works, and then someday it may break with the cryptic error messages that I posted above. The error messages could be better if it said something like "Unable to create new shared memory segments: is your shared memory partition full? Try running fastdds shm clean." Or, if a shared memory segment is unable to be created, then fastdds could automatically try fastdds shm clean and retry (with a corresponding warning message as to what is going on).

Aposhian avatar Jul 11 '22 14:07 Aposhian

@JLBuenoLopez-eProsima How about updating the message for user to be more user friendly as @Aposhian suggested above? i think that message is not clear for user what to do to solve the problem...

fujitatomoya avatar Aug 29 '22 18:08 fujitatomoya

I encountered the same issue on Android, and I never found a solution. I had to disable shared memory. This problem occurred after compiling the security module, and the problem still exists after undoing the modification again.

jingTian-z avatar Oct 26 '22 08:10 jingTian-z

I have created a ticket to track the enhancement of improving the log messages when initializing the SHM Transport (#3578). I am going to close this issue because the question was answered and there is a ticket tracking the improvement.

JLBuenoLopez avatar Jun 09 '23 05:06 JLBuenoLopez