Question about the implementation of `photon::mutex`
After checking the source code, I believe that photo::mutex should follow a FIFO policy. This means that the next thread to acquire the lock will definitely be the first thread in the waiting queue. This implies that there must be at least one wake-up process between the time when the thread holding the lock releases it and the time when the next thread acquires the lock and continues to execute.
In contrast, the mutex implementation in brpc allows any bthread to preempt the lock after it is unlocked, so there is only one CAS operation between the time when the thread holding the lock releases it and the time when the next thread acquires the lock and continues to execute.
Why didn't photon adopt this kind of implementation? I think such an implementation would perform much better in cases where the critical section is small and contention is heavy.
there must be at least one wake-up process between the time when the thread holding the lock releases it and the time when the next thread acquires the lock
No, there isn't such in-between process. When one thread unlocks a mutex, it directly transfers the ownership of the mutex to the first waiting thread, and resumes it accordingly.
inline void do_mutex_unlock(mutex* m)
{
SCOPED_LOCK(m->splock);
ScopedLockHead h(m);
m->owner.store(h);
if (h)
prelocked_thread_interrupt(h, -1);
}
In contrast ... allows any bthread to preempt the lock after it is unlocked
Lock contention usually leads to low-efficiency. And it is called "thundering herd" in your specific case.
@lihuiba Thank you for your reply! But I still have some questions.
No, there isn't such in-between process. When one thread unlocks a mutex, it directly transfers the ownership of the mutex to the first waiting thread, and resumes it accordingly.
If the lock holder thread and the waiter thread are not in the same vcpu, prelocked_thread_interrupt needs to put the waiter thread into standbyq and then call cancel_wait to wakeup another vcpu. And that vcpu may not resume this waiter thread immediately. This is what I called at least one wake-up process
Lock contention usually leads to low-efficiency. And it is called "thundering herd" in your specific case.
Sorry, I didn’t make the preemption mechanism clear. In brpc’s implementation, only the thread at the head of the wait list is awakened, and it then contends with all other in-coming threads for the lock.
For example: A holds the lock, B and C are already suspended in the wait list, and D, E, and F are also going to acquire the lock.
+--------------------------------------+
| IN-COMING THREADS |
| D, E, F |
+--------------------------------------+
+----------------------+ +--------------------------------------+
| mutex L | | WAITLIST |
| owner : Thread A | | [ B ] -> [ C ] |
+----------------------+ +--------------------------------------+
After A unlocks, it will only wakeup B. And B will contend with D, E and F to get the mutex. So there is no "thundering herd" in such case.
In brpc’s implementation, only the thread at the head of the wait list is awakened, and it then contends with all other in-coming threads for the lock.
So it behaves similar to photon.
prelocked_thread_interruptneeds to put the waiter thread intostandbyqand then callcancel_waitto wakeup anothervcpu. And thatvcpumay not resume this waiter thread immediately.
Even if the lock holder thread and the waiter thread are in the same vcpu, the waiter is not executed immediately after resumed. Instead, it is appended to the end of running queue. This is not (much) better than appending to standby queue in the case of cross-vcpu, because standby queue can be considered as an extension of the end of running queue.
There's also the option to always insert resumed waiters to the front of the running queue (and consider the standby queue as an extension of the front of running queue), but there's no evident that this is better.
In brpc’s implementation, only the thread at the head of the wait list is awakened, and it then contends with all other in-coming threads for the lock.
So it behaves similar to photon.
No, in-coming threads will not contend for a locked mutex in photon. They simply join the wait list of the mutex at tail after failing to try_lock() it.
it is appended to the end of running queue.
That's what I concerned about: Suppose there are a lot of threads in the running queue of B's vcpu, B has to wait for a long time before it resumes to execute. As a result, there is a quite a long time after A unlocks and B acquires the lock. The operations per second of the mutex might be quite low.
If there are lots of threads running in the vcpu, why you choose to prioritize a thread over others?
If there are lots of threads running in the vcpu, why you choose to prioritize a thread over others?
We don't, that's why we allow in-coming threads to get the lock. It reduces the time between A unlocks the mutex and the next one, no matter which threads, acquires the lock.
It reduces the time between A unlocks the mutex and the next one, no matter which threads, acquires the lock.
There may be some scenarios where such "prioritized mutex" is important. We never seen such scenarios before. Can you tell us more about it?
Another potential option for this is, making some threads have higher priority, so whenever they are resumed, they are inserted to the head of running queue, and they will get executed before others. Does this meet your needs?
It reduces the time between A unlocks the mutex and the next one, no matter which threads, acquires the lock.
There may be some scenarios where such "prioritized mutex" is important. We never seen such scenarios before. Can you tell us more about it?
Another potential option for this is, making some threads have higher priority, so whenever they are resumed, they are inserted to the head of running queue, and they will get executed before others. Does this meet your needs?
Suppose we have a lock that protects a lease. Before processing each RPC, we must acquire this lock to check whether the lease is still valid. In other words, the critical section is tiny but the request rate is high.
If the lock is strictly FIFO, the timeline looks roughly like this from the mutex’s point of view:
| A lock | short time | A unlock, wakeup B | long delay (B enters vCPU queue, waits for other threads to be descheduled) | B resumes | B lock | short time | B unlock, wakeup C | ...
If the lock is preemptible, the timeline might become:
| A lock | short time | A unlock | C lock | A wakeup B | short time | C unlock | ...
The interval between “unlock” and “next lock” is now much shorter.
@Yriuns Please try #992 and see whether this solves your problem
Adopting this mechanism (the brpc solution, and seemingly the pthread_mutex solution as well) in photon might lead to coroutine starvation.
If there is only one vcpu, after each unlock, the head of the waiting queue will add runq, and then wait to be awakened. If a new request happens to come in and the lock can be directly obtained, the awakened coroutine will once again enter the tail of the waiting queue. In severe cases, it may lead to some coroutines being unable to obtain locks for a long time, resulting in the occurrence of long tails.
Of course, the problems mentioned by @Yriuns do exist. If photon is to adopt this solution, it must address the issue of coroutine starvation.
@ivanallen Yes, it may lead to starvation. So I tend to take it as a special mechanism for special cases. I'm not sure whether it should be merged.
No matter what, the feature is only an option that doesn't change the default behavior.