online icon indicating copy to clipboard operation
online copied to clipboard

UnitBase: No-side-effect usage of AtomicSharedPtr<SocketPoll> instead of single `shared_ptr<T>` and `mutex`

Open sgothel opened this issue 1 year ago • 7 comments

  • Resolves: #10228
  • Target version: master

Summary

The two seperate single non-recursive mutex _lockSocketPoll and _lock were required due to the recursive call from _lock protected code into _lockSocketPoll protected code.

This PR offers new AtomicSharedPtr<T> for thread-sync'ed shared_ptr<T> access

  • provides atomic operation for std::shared_ptr<T> similar to std::atomic<std::shared_ptr<T>>.
  • further provides thread-safe assignIfNull and assignIfNotNull atomic conditional assignment methods.
  • implementation uses std::mutex to synchronize std::shared_ptr<T> access.

Using AtomicSharedPtr<T> for SocketPoll removes any potential unwarranted synchronization side-effects, similar to std::atomic<std::shared_ptr<T>>, while also providing conditional atomic assignment.

Checklist

  • [ ] I have run make prettier-write and formatted the code.
  • [ ] All commits have Change-Id
  • [ ] I have run tests with make check
  • [ ] I have issued make run and manually verified that everything looks okay
  • [ ] Documentation (manuals or wiki) has been updated or is not required

sgothel avatar Oct 16 '24 20:10 sgothel

Urk; not good. Surely we can have a getSocketPollUnlocked() and just assert that _lock is taken in there; or better just use _lock directly in that section - surely ?

This PR is not my favorite, just provided optionally - me happy as-merged.

Since the threading and caller restriction doesn't exist for methods calling into getSocketPoll() etc, the initial merged PR is IMHO sufficient and clean using 2 simple mutex. Note that my initial attempt was to simply using C++20 atomic<shared_ptr>, but neither CodeQL nor Android/iOS allowed it. Edit: I am aware that such atomic<shared_ptr> might just be implemented using an internal lock.

Yes, we could detail and constrain the call of methods calling into getSocketPoll() to avoid the lock, i.e. use _socketPoll w/o synchronization. I usually only do such tricks and 'inner method constraints' above the language level, for high frequency methods enabled in production. This is not the case here.

I could analyze the recursive nature in more detail and add the constraint to drop the lock where it currently is already taken. This would indeed require to ensure such methods are not called otherwise. Shall I do this?

sgothel avatar Oct 17 '24 11:10 sgothel

In general adding a new lock is a nightmare. We want to reduce the number of locks, and simplify interactions. Having a valid reason for having two locks to protect different members of the same class would require some rather complex scenario IMHO - which I don't see here. Similarly recursive locks are bad practice.

I don't want our code to sprout an ever increasing number of mutexes. This is in large part why I would like the stack trace of the race / contention - which may indicate some lower level problem. Perhaps you can construct a better description of what was called from where that caused what other thread to race it. Ideally with a pair of stack frames. Even better - we can get rid of this extra lock, and re-use the one we already have.

I would like to see why we are destroying tests while we are invoking new ones or whatever silly is happening.

Thanks!

mmeeks avatar Oct 17 '24 11:10 mmeeks

In general adding a new lock is a nightmare. We want to reduce the number of locks, and simplify interactions.

I also would only use synchronization when demanded by a use-case.

Having a valid reason for having two locks to protect different members of the same class would require some rather complex scenario IMHO - which I don't see here. Similarly recursive locks are bad practice.

Here, _socketPoll is accessed by more than one thread and (being new to this code), I assume it is intentional and by design. Hence the use-case requires the data to be sync'ed, i.e. using atomic or local-lock.

I don't want our code to sprout an ever increasing number of mutexes. This is in large part why I would like the stack trace of the race / contention - which may indicate some lower level problem. Perhaps you can construct a better description of what was called from where that caused what other thread to race it.

I would like to see why we are destroying tests while we are invoking new ones or whatever silly is happening.

Yes, I only added the comment to UnitBase::socketPoll()

    // We could be called from either a UnitWSD::DocBrokerDestroy (prisoner_poll)
    // or from UnitWSD::invokeTest() (coolwsd main).

will add stack traces here.

Ideally with a pair of stack frames. Even better - we can get rid of this extra lock, and re-use the one we already have.

I detailed why it might be risky to avoid having an additional sync-instrument on _socketPoll (constraints of callers).

Will add the stack traces for more clarity and give the constraint another thought. OK! As mentioned, so far the use-case implies a sync/local-lock is required - but if it turns out we want to guarantee that certain callers only happen in an already locked state ... Edit: Hence I hesitated to dive too far and imply anything - but will look at it.

sgothel avatar Oct 17 '24 12:10 sgothel

Little call-tree analysis

Called with guarded _lock, safe

  • UnitBase::endTest ()
    • caller UnitBase::exitTest

Called outside by diff threads, needs sync for _socketPoll)

  • UnitKit::postFork
  • UnitBase::~UnitBase
  • UnitBase::socketPoll()

sgothel avatar Oct 17 '24 12:10 sgothel

Best semantic coverage would have been to use C++20 atomic<shared_ptr> _socketPoll, but not possible due to tooling (see above). I.e. this would made clear that there are no mutex side-effects even if that atomic internally would use a mutex.

To clarify this (in the orig merged PR) .. we could have a replacement for atomic<shared_ptr> as intended with the getSocketPoll method - covering the data + sync. At least this would clarify intend and remove possible side-effects.

sgothel avatar Oct 17 '24 12:10 sgothel

Force push provides a new proposal for clarification, i.e. no-side-effect usage of synchronized shared_ptr<SocketPoll>, please read updated description.

If accepted, I would propose to move the new AtomicSharedPtr<T> to online's common for general use.

sgothel avatar Oct 17 '24 13:10 sgothel

One log/trace produced with my instrumented code earlier showing the data-race on _socketPoll.

Compare my marks JAU 02 and JAU 10 showing Backtraces from different threads both originating from UnitBase::socketPoll().

unit-wopi-save-on-exit_1.log.txt

sgothel avatar Oct 17 '24 14:10 sgothel