Fast-DDS icon indicating copy to clipboard operation
Fast-DDS copied to clipboard

[21293] Fix destruction data-race on participant removal in intra-process

Open Mario-DL opened this issue 1 year ago • 0 comments

Description

This PR addresses a race issue happening in stressed intraprocess scenarios when EDP's writer intends to use the remote local reader pointer of an already removed participant. This happens because the participant hasn't received the other's one disposal yet (as it goes through transport).

Some ci flaky tests have already been identified to be related with this issue.

The proposed solution introduces a new state in the Readers LocalReaderViewStatus in which the reader will notify that it is inactive as soon as it is destroyed and noone is using it. On the other side, the remote local writers using pointers to it, now holds a LocalReaderPointer which wraps the raw reader's pointer plus the view. An internal counter now accounts for the number of references.

Thanks @MiguelCompany for helping with the final's solution design.

Note: the test may be launched with --restest-until-fail 20 or so, in order to reproduce the issue. For a more frequent failure, review can launch the colcon test with the taskset -c 0,1 prefix to make the test to stress more and make it fail more frequently.

@Mergifyio backport 3.1.x 3.0.x 2.14.x 2.10.x

Contributor Checklist

  • [X] Commit messages follow the project guidelines.
  • [X] The code follows the style guidelines of this project.
  • [X] Tests that thoroughly check the new feature have been added/Regression tests checking the bug and its fix have been added; the added tests pass locally
  • [X] Any new/modified methods have been properly documented using Doxygen.
  • N/A Any new configuration API has an equivalent XML API (with the corresponding XSD extension)
  • [X] Changes are backport compatible: they do NOT break ABI nor change library core behavior.
  • [X] Changes are API compatible.
  • N/A New feature has been added to the versions.md file (if applicable).
  • N/A New feature has been documented/Current behavior is correctly described in the documentation.
  • [X] Applicable backports have been included in the description.

Reviewer Checklist

  • [x] The PR has a milestone assigned.
  • [x] The title and description correctly express the PR's purpose.
  • [x] Check contributor checklist is correct.
  • [ ] If this is a critical bug fix, backports to the critical-only supported branches have been requested.
  • [ ] Check CI results: changes do not issue any warning.
  • [ ] Check CI results: failing tests are unrelated with the changes.

Mario-DL avatar Jul 07 '24 21:07 Mario-DL