tensorpipe
tensorpipe copied to clipboard
move to semaphore
Summary: sched_yield looping results in much higher latency in highly concurrent pipe I/O. It's not strictly busy wait, but relinquish CPU and move to the end of priority task queue while no immediately available data. http://man7.org/linux/man-pages/man2/sched_yield.2.html. The overhead of re-scheduling and context switch could be high. Conversely, modern semaphore/mutex is kind hybrid locking of spinlock and sleep-lock - performs spinklock for the first part of remained scheduled CPU time and sleep after if lock is unavailable. It avoids both frequent re-scheduling and eating up CPU with spin. In practice it's always performs better than pure spinlock or re-schedule, even to protect lightweight operations like counter update.
Differential Revision: D21446104
This pull request was exported from Phabricator. Differential Revision: D21446104
I want to take a close look at this, as it goes against what I had observed, so it may take a bit to get back to you, sorry...