swift icon indicating copy to clipboard operation
swift copied to clipboard

Fix quadratic performance of the `ListMerger` in specific usage pattern

Open nickolas-pohilets opened this issue 1 year ago • 10 comments

When visiting a tree and creating new job per each tree node, ListMerger::merge() can exhibit quadratic performance.

Every time DefaultActorImpl::drainOne() is called we have a queue of jobs consisting of 2 unprocessed jobs (in reverse order), and N processed ones, where N keeps growing. All jobs have the same priority.

Unprocessed jobs are taken out of the queue and reversed in the process.

A new instance of ListMerger is used to merge this list of 2 jobs into the list of N processed jobs. But merging is always an appending and requires linear traversal of the entire list of N processed jobs to find the insertion point.

This PR fixes this by caching in the DefaultActorImpl information about the last insertion point of the ListMerger.

Before the change:

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10 
  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100       100   54.900   62.000  137.600
 51 AsyncTree.5000      100 3720.600 3926.800 4603.800

After the change

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10
  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100       100   55.600   61.200  214.400
 51 AsyncTree.5000      100 2561.200 2796.100 3226.600

As expected, there is no significant difference for small number of jobs, but there is a noticeable improvement for large number of jobs.

Note that API used in attached benchmark first schedules task on the generic executor, which changes pattern of enqueueing and draining. Several drains may happen in a row, and ListMerger::merge() is called only for the first one. If in the future new API allows to schedule jobs directly onto actor, this change will have much bigger impact.

nickolas-pohilets avatar Jan 13 '24 20:01 nickolas-pohilets

@swift-ci please smoke test

ktoso avatar Feb 12 '24 10:02 ktoso

@swift-ci please smoke test

ktoso avatar Feb 12 '24 13:02 ktoso

@swift-ci please build toolchain

ktoso avatar Mar 21 '24 21:03 ktoso

@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.

we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them

What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d?

When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?

We can keep a bit-field of which buckets have jobs in the status.

Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?

Escalation can just splice the job out of its old list and put it in the new one.

Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?

nickolas-pohilets avatar Mar 31 '24 23:03 nickolas-pohilets

I've pushed a commit, which implements actor's queue as a single linked list but with 5 insertion points. This is sufficient to ensure that each job is preprocessed in O(1). But it won't work for unknown priorities. And there are no changes related to the priority escalation. Let me know what you think.

This eliminates usages of the ListMerger from default actor, but there are still 2 left in CooperativeGlobalExecutor.inc. Looks like JobQueue suffers from the same problem as default actor, and may also exhibit quadratic behaviour. If suggested approach looks good, I can extract reusable code from Actor.cpp and use it for JobQueue in CooperativeGlobalExecutor.inc, and completely remove ListMerger.

nickolas-pohilets avatar Apr 02 '24 08:04 nickolas-pohilets

@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.

Oh, sorry, I didn't see this comment. Let me respond to it as well for clarity's sake.

we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them

What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d?

Yes, exactly. I'd like to stay future-proof about supporting those other priority values as long as it doesn't cost us too much (and I don't expect it will).

When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?

Jobs, yes. I'm just saying that it's okay if we effectively treat jobs with extended priorities as if they had one of the canonical priorities.

We can keep a bit-field of which buckets have jobs in the status.

Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?

We can, yes. That's probably not a significant burden.

Escalation can just splice the job out of its old list and put it in the new one.

Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?

Oh, maybe I was thinking of something I've thought about doing but haven't ever done. Nevermind, then.

rjmccall avatar Apr 03 '24 22:04 rjmccall

Updated.

Prioritised and non-prioritised queues are fully separated. JobRef removed. Prioritised is stored in the end of the actor layout to minimise false sharing.

It was a bit challenging to update all the places where there is a check that queue is empty. Before it required only an atomic read of the ActiveActorStatus, but now it requires also reading prioritisedJobs, which can be done only under the lock:

  • in DefaultActorImpl::unlock() is safe to check before unlocking
  • in DefaultActorImpl::tryLock() I've had to add a flag to assert after locking
  • in DefaultActorImpl::destroy(), IIUC, it cannot be done, because drainer can run concurrently with destruction. Is this correct? I've left a TODO about this.

Renamed ActiveActorStatus::FirstJob to ActiveActorStatus::FirstUnprioritisedJob to audit all the usages.

I've had to share some code between stdlib/public/Concurrency/Actor.cpp and stdlib/public/Concurrency/CooperativeGlobalExecutor.inc, and I have doubts if include/swift/ABI/MetadataValues.h is the best place for this.

Benchmark results look similar:

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10

  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100        100   55.300   63.100  120.000
 51 AsyncTree.5000       100 2543.200 2709.800 3281.800

nickolas-pohilets avatar Apr 12 '24 22:04 nickolas-pohilets

Very nice work, thank you for the update @nickolas-pohilets ! Will give it a read shortly

ktoso avatar Apr 13 '24 03:04 ktoso

@swift-ci Please test

rjmccall avatar Apr 24 '24 20:04 rjmccall

@swift-ci Please test

rjmccall avatar May 03 '24 21:05 rjmccall

@swift-ci Please test

rjmccall avatar May 07 '24 04:05 rjmccall

@swift-ci Please test macOS

rjmccall avatar May 07 '24 17:05 rjmccall

I've asked @rokhinip and @mikeash to try to find time to take a look, but otherwise I think we're all set. We'll have to talk about our risk tolerance for merging this to the 6.0 branch.

rjmccall avatar May 08 '24 18:05 rjmccall

@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?

nickolas-pohilets avatar May 10 '24 09:05 nickolas-pohilets

@swift-ci Please test

rjmccall avatar May 19 '24 20:05 rjmccall

@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?

It's okay, I'll keep track of it. Thanks for asking, though.

rjmccall avatar May 19 '24 20:05 rjmccall

@nickolas-pohilets Merged. Thank you!

rjmccall avatar May 20 '24 21:05 rjmccall