online
online copied to clipboard
UnitBase: No-side-effect usage of AtomicSharedPtr<SocketPoll> instead of single `shared_ptr<T>` and `mutex`
- Resolves: #10228
- Target version: master
Summary
The two seperate single non-recursive mutex _lockSocketPoll and _lock were required due to the recursive call from _lock protected code into _lockSocketPoll protected code.
This PR offers new AtomicSharedPtr<T> for thread-sync'ed shared_ptr<T> access
- provides atomic operation for
std::shared_ptr<T>similar tostd::atomic<std::shared_ptr<T>>. - further provides thread-safe
assignIfNullandassignIfNotNullatomic conditional assignment methods. - implementation uses
std::mutexto synchronizestd::shared_ptr<T>access.
Using AtomicSharedPtr<T> for SocketPoll removes any potential unwarranted synchronization side-effects,
similar to std::atomic<std::shared_ptr<T>>, while also providing conditional atomic assignment.
Checklist
- [ ] I have run
make prettier-writeand formatted the code. - [ ] All commits have Change-Id
- [ ] I have run tests with
make check - [ ] I have issued
make runand manually verified that everything looks okay - [ ] Documentation (manuals or wiki) has been updated or is not required
Urk; not good. Surely we can have a getSocketPollUnlocked() and just assert that _lock is taken in there; or better just use _lock directly in that section - surely ?
This PR is not my favorite, just provided optionally - me happy as-merged.
Since the threading and caller restriction doesn't exist for methods calling into getSocketPoll() etc, the initial merged PR is IMHO sufficient and clean using 2 simple mutex. Note that my initial attempt was to simply using C++20 atomic<shared_ptr>, but neither CodeQL nor Android/iOS allowed it. Edit: I am aware that such atomic<shared_ptr> might just be implemented using an internal lock.
Yes, we could detail and constrain the call of methods calling into getSocketPoll() to avoid the lock, i.e. use _socketPoll w/o synchronization. I usually only do such tricks and 'inner method constraints' above the language level, for high frequency methods enabled in production. This is not the case here.
I could analyze the recursive nature in more detail and add the constraint to drop the lock where it currently is already taken. This would indeed require to ensure such methods are not called otherwise. Shall I do this?
In general adding a new lock is a nightmare. We want to reduce the number of locks, and simplify interactions. Having a valid reason for having two locks to protect different members of the same class would require some rather complex scenario IMHO - which I don't see here. Similarly recursive locks are bad practice.
I don't want our code to sprout an ever increasing number of mutexes. This is in large part why I would like the stack trace of the race / contention - which may indicate some lower level problem. Perhaps you can construct a better description of what was called from where that caused what other thread to race it. Ideally with a pair of stack frames. Even better - we can get rid of this extra lock, and re-use the one we already have.
I would like to see why we are destroying tests while we are invoking new ones or whatever silly is happening.
Thanks!
In general adding a new lock is a nightmare. We want to reduce the number of locks, and simplify interactions.
I also would only use synchronization when demanded by a use-case.
Having a valid reason for having two locks to protect different members of the same class would require some rather complex scenario IMHO - which I don't see here. Similarly recursive locks are bad practice.
Here, _socketPoll is accessed by more than one thread and (being new to this code), I assume it is intentional and by design. Hence the use-case requires the data to be sync'ed, i.e. using atomic or local-lock.
I don't want our code to sprout an ever increasing number of mutexes. This is in large part why I would like the stack trace of the race / contention - which may indicate some lower level problem. Perhaps you can construct a better description of what was called from where that caused what other thread to race it.
I would like to see why we are destroying tests while we are invoking new ones or whatever silly is happening.
Yes, I only added the comment to UnitBase::socketPoll()
// We could be called from either a UnitWSD::DocBrokerDestroy (prisoner_poll)
// or from UnitWSD::invokeTest() (coolwsd main).
will add stack traces here.
Ideally with a pair of stack frames. Even better - we can get rid of this extra lock, and re-use the one we already have.
I detailed why it might be risky to avoid having an additional sync-instrument on _socketPoll (constraints of callers).
Will add the stack traces for more clarity and give the constraint another thought. OK! As mentioned, so far the use-case implies a sync/local-lock is required - but if it turns out we want to guarantee that certain callers only happen in an already locked state ... Edit: Hence I hesitated to dive too far and imply anything - but will look at it.
Little call-tree analysis
Called with guarded _lock, safe
- UnitBase::endTest ()
- caller UnitBase::exitTest
Called outside by diff threads, needs sync for _socketPoll)
- UnitKit::postFork
- UnitBase::~UnitBase
- UnitBase::socketPoll()
Best semantic coverage would have been to use C++20 atomic<shared_ptr> _socketPoll,
but not possible due to tooling (see above).
I.e. this would made clear that there are no mutex side-effects even if that atomic internally would use a mutex.
To clarify this (in the orig merged PR) .. we could have a replacement for atomic<shared_ptr>
as intended with the getSocketPoll method - covering the data + sync.
At least this would clarify intend and remove possible side-effects.
Force push provides a new proposal for clarification, i.e. no-side-effect usage of synchronized shared_ptr<SocketPoll>, please read updated description.
If accepted, I would propose to move the new AtomicSharedPtr<T> to online's common for general use.
One log/trace produced with my instrumented code earlier showing the data-race on _socketPoll.
Compare my marks JAU 02 and JAU 10 showing Backtraces from different threads both originating from UnitBase::socketPoll().