folly
folly copied to clipboard
fix deadlock in RCU retire when rcu_reader active
Summary: Every 3.2 seconds RCU retire tries to opportunistically drain the queue of pending work. If this happens while the current thread is holding an rcu_reader and another thread is currently running synchronize(), a deadlock will occur. This diff adds a unit test for the behavior and fixes the problem.
Test Plan:
- unit test that reproduces the problem
- unit test passes with the fix
@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@Orvid has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@ot, is anybody at Meta working on a functioning implementation of rcu_barrier()
? If not, would you be willing to take it as an external contribution? Rockset has a barrier implementation we have been using that uses a second RCU domain to track pending cleanup tasks. It doubles the memory and TLS footprint of the domain, but has essentially no performance impact (one RCU lock + unlock per cleanup list node).