folly icon indicating copy to clipboard operation
folly copied to clipboard

fix deadlock in RCU retire when rcu_reader active

Open nbronson opened this issue 1 year ago • 1 comments

Summary: Every 3.2 seconds RCU retire tries to opportunistically drain the queue of pending work. If this happens while the current thread is holding an rcu_reader and another thread is currently running synchronize(), a deadlock will occur. This diff adds a unit test for the behavior and fixes the problem.

Test Plan:

  • unit test that reproduces the problem
  • unit test passes with the fix

nbronson avatar May 31 '23 01:05 nbronson

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jul 10 '23 19:07 facebook-github-bot

@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jul 17 '23 19:07 facebook-github-bot

@ot, is anybody at Meta working on a functioning implementation of rcu_barrier()? If not, would you be willing to take it as an external contribution? Rockset has a barrier implementation we have been using that uses a second RCU domain to track pending cleanup tasks. It doubles the memory and TLS footprint of the domain, but has essentially no performance impact (one RCU lock + unlock per cleanup list node).

nbronson avatar Jul 31 '23 18:07 nbronson