swift
swift copied to clipboard
Fix quadratic performance of the `ListMerger` in specific usage pattern
When visiting a tree and creating new job per each tree node, ListMerger::merge()
can exhibit quadratic performance.
Every time DefaultActorImpl::drainOne()
is called we have a queue of jobs consisting of 2 unprocessed jobs (in reverse order), and N processed ones, where N keeps growing. All jobs have the same priority.
Unprocessed jobs are taken out of the queue and reversed in the process.
A new instance of ListMerger
is used to merge this list of 2 jobs into the list of N processed jobs. But merging is always an appending and requires linear traversal of the entire list of N processed jobs to find the insertion point.
This PR fixes this by caching in the DefaultActorImpl
information about the last insertion point of the ListMerger
.
Before the change:
$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10
# TEST SAMPLES MIN MEDIAN MAX
50 AsyncTree.100 100 54.900 62.000 137.600
51 AsyncTree.5000 100 3720.600 3926.800 4603.800
After the change
$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10
# TEST SAMPLES MIN MEDIAN MAX
50 AsyncTree.100 100 55.600 61.200 214.400
51 AsyncTree.5000 100 2561.200 2796.100 3226.600
As expected, there is no significant difference for small number of jobs, but there is a noticeable improvement for large number of jobs.
Note that API used in attached benchmark first schedules task on the generic executor, which changes pattern of enqueueing and draining. Several drains may happen in a row, and ListMerger::merge()
is called only for the first one. If in the future new API allows to schedule jobs directly onto actor, this change will have much bigger impact.
@swift-ci please smoke test
@swift-ci please smoke test
@swift-ci please build toolchain
@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.
we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them
What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d
?
When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?
We can keep a bit-field of which buckets have jobs in the status.
Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?
Escalation can just splice the job out of its old list and put it in the new one.
Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?
I've pushed a commit, which implements actor's queue as a single linked list but with 5 insertion points. This is sufficient to ensure that each job is preprocessed in O(1). But it won't work for unknown priorities. And there are no changes related to the priority escalation. Let me know what you think.
This eliminates usages of the ListMerger
from default actor, but there are still 2 left in CooperativeGlobalExecutor.inc
. Looks like JobQueue
suffers from the same problem as default actor, and may also exhibit quadratic behaviour. If suggested approach looks good, I can extract reusable code from Actor.cpp
and use it for JobQueue
in CooperativeGlobalExecutor.inc
, and completely remove ListMerger
.
@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.
Oh, sorry, I didn't see this comment. Let me respond to it as well for clarity's sake.
we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them
What do you mean by "extended priority"? A value not present in the enum, e.g.
0x1d
?
Yes, exactly. I'd like to stay future-proof about supporting those other priority values as long as it doesn't cost us too much (and I don't expect it will).
When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?
Jobs, yes. I'm just saying that it's okay if we effectively treat jobs with extended priorities as if they had one of the canonical priorities.
We can keep a bit-field of which buckets have jobs in the status.
Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?
We can, yes. That's probably not a significant burden.
Escalation can just splice the job out of its old list and put it in the new one.
Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?
Oh, maybe I was thinking of something I've thought about doing but haven't ever done. Nevermind, then.
Updated.
Prioritised and non-prioritised queues are fully separated. JobRef
removed. Prioritised is stored in the end of the actor layout to minimise false sharing.
It was a bit challenging to update all the places where there is a check that queue is empty. Before it required only an atomic read of the ActiveActorStatus
, but now it requires also reading prioritisedJobs
, which can be done only under the lock:
- in
DefaultActorImpl::unlock()
is safe to check before unlocking - in
DefaultActorImpl::tryLock()
I've had to add a flag to assert after locking - in
DefaultActorImpl::destroy()
, IIUC, it cannot be done, because drainer can run concurrently with destruction. Is this correct? I've left a TODO about this.
Renamed ActiveActorStatus::FirstJob
to ActiveActorStatus::FirstUnprioritisedJob
to audit all the usages.
I've had to share some code between stdlib/public/Concurrency/Actor.cpp
and stdlib/public/Concurrency/CooperativeGlobalExecutor.inc
, and I have doubts if include/swift/ABI/MetadataValues.h
is the best place for this.
Benchmark results look similar:
$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10
# TEST SAMPLES MIN MEDIAN MAX
50 AsyncTree.100 100 55.300 63.100 120.000
51 AsyncTree.5000 100 2543.200 2709.800 3281.800
Very nice work, thank you for the update @nickolas-pohilets ! Will give it a read shortly
@swift-ci Please test
@swift-ci Please test
@swift-ci Please test
@swift-ci Please test macOS
I've asked @rokhinip and @mikeash to try to find time to take a look, but otherwise I think we're all set. We'll have to talk about our risk tolerance for merging this to the 6.0 branch.
@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?
@swift-ci Please test
@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?
It's okay, I'll keep track of it. Thanks for asking, though.
@nickolas-pohilets Merged. Thank you!