Fast-DDS icon indicating copy to clipboard operation
Fast-DDS copied to clipboard

Deadlock in v2.6.2

Open Barry-Xu-2018 opened this issue 3 years ago • 7 comments
trafficstars

Is there an already existing issue for this?

  • [X] I have searched the existing issues

Expected behavior

No deadlock occurs at startup

Current behavior

A high deadlock rate occurs at startup.

Steps to reproduce

The scenario while deadlock occurs.

thread3 thread2 thread1

##1 Get a lock on mp_mutex in Thread3
##2 Get shared lock of endpoints_list_mutex in Thread2
##3 Trying to get mp_mutex in Thread2, but it is blocked because it is already locked in Thread3
##4 Trying to get write lock of endpoints_list_mutex in Thread1, but it is blocked because there is a reader in ##2. https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/include/fastrtps/utils/shared_mutex.hpp#L69-L72 write_entered flag is set, and following endpoints_list_mutex reads are blocked. https://github.com/eProsima/Fast-DDS/blob/5076ebc0c5d030cac6225b94e18ef5b17c996ef3/include/fastrtps/utils/shared_mutex.hpp#L98-L101

##5 Trying to get shared lock of endpoints_list_mutex in Thread3, but it is blocked because of the write_entered flag

Fast DDS version/commit

v2.6.2

Platform/Architecture

Ubuntu Focal 20.04 amd64

Transport layer

Default configuration, UDPv4 & SHM

Additional context

For same codes, there is no deadlock with v2.6.0

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

Barry-Xu-2018 avatar Sep 21 '22 04:09 Barry-Xu-2018

@MiguelCompany @eProsima/team this is deadlock issue, just friendly ping.

fujitatomoya avatar Sep 21 '22 04:09 fujitatomoya

@Barry-Xu-2018 @fujitatomoya There's a proposed fix in #2976, could you check with it?

MiguelCompany avatar Oct 04 '22 09:10 MiguelCompany

@MiguelCompany thanks! we will try that out and get back to you.

fujitatomoya avatar Oct 04 '22 16:10 fujitatomoya

@Barry-Xu-2018 @fujitatomoya Did you have time to check whether #2976 fixes this?

MiguelCompany avatar Oct 14 '22 05:10 MiguelCompany

@MiguelCompany i will check the evaluation status, will get back to you soon.

fujitatomoya avatar Oct 14 '22 05:10 fujitatomoya

@MiguelCompany According to changed code, I think it can fix this problem. Fujita-san will provide final evaluation result in the real environment.

Barry-Xu-2018 avatar Oct 14 '22 05:10 Barry-Xu-2018

Fujita-san

that is me 😄 family name!

fujitatomoya avatar Oct 14 '22 05:10 fujitatomoya

@fujitatomoya hello,how is the final evaluation about #2976 going?

wade30822 avatar Oct 26 '22 07:10 wade30822

sorry we confirmed that no deadlock observed after this PR.

fujitatomoya avatar Oct 26 '22 16:10 fujitatomoya

@fujitatomoya thx~

wade30822 avatar Oct 27 '22 13:10 wade30822

Closing based on https://github.com/eProsima/Fast-DDS/issues/2961#issuecomment-1292271447

MiguelCompany avatar Feb 07 '23 17:02 MiguelCompany