parse-server
parse-server copied to clipboard
Parallelize loading `include` objects
When preloading multiple fields, we can get some extra performance by doing multiple fetches simultaneously.
Consider a query fetching comments, and requesting that ParseServer preload "post" and "author" pointers:
GET /classes/Comment?include=post,author
In this case, we first need to fetch the resulting "Comment" documents, but after that we should be able to fetch the documents referenced by the post and author fields of the results simultaneously, as they don't depend on each other at all.
Things get a little trickier when we have nested fields:
GET /classes/Comment?include=post,post.author,post.category
To resolve this query, we first need to fetch the related posts, and only once we've added the data about those posts into the results tree can we scan it for the post.author and post.category pointers. But, once that first fetch is completed, we can unblock both of those nested queries!
The solution here is to build what is basically a dependency graph out of promises; each include path blocks while it is waiting for whatever path it depends on to finish loading. Finally, once we have that graph, we return a promise that depends on every node in it.
Aside: Technically we only need to depend on leaf nodes, but there shouldn't be any meaningful difference between waiting on the leafs and waiting on the entire graph, and this way we don't have to do any analysis to find the leafs)
Implementation Notes:
-
To make these parallel updates play nicely, I started by switching the code that updates the response with the fetched objects from a style where it clones the response as it inserts the objects to a style where it mutates the original response. Because we know that each include is affecting independent paths in the tree, this should be a safe optimization, and it may have other beneficial effects for performance as we build fewer intermediate representations of the response.
-
While making that change, I found that it was easier for me to think about when I split the responsibilities of traversing the object graph from the responsibility of updating a given node. The traversal helper also served the
findPointersfunction nicely! -
It's possible that for degenerate cases (hundreds of includes in a single query), this could result in performance degradation as many queries are kicked off in parallel. For the more common case of just a few top level includes, this should be a noticeable speed up as we remove a "waterfall" style dependency graph.
Hi @noahsilas
Have you thought about how to profile this stuff so we can get some before/after benchmarks?
Gonna try and digest this.
Codecov Report
Merging #6501 (c23b09a) into master (cc5f14e) will increase coverage by
0.01%. The diff coverage is100.00%.
@@ Coverage Diff @@
## master #6501 +/- ##
==========================================
+ Coverage 93.87% 93.88% +0.01%
==========================================
Files 169 169
Lines 11968 11961 -7
==========================================
- Hits 11235 11230 -5
+ Misses 733 731 -2
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/RestQuery.js | 95.86% <100.00%> (+0.38%) |
:arrow_up: |
| src/RestWrite.js | 93.64% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update cc5f14e...c23b09a. Read the comment docs.
One thing I'm noticing as I've been pushing patches on to this branch to make the tests all pass is that the traversal stuff is actually quite complex. I'm thinking that it might be worthwhile to drop that from this PR. I tried a bit more of a minimalistic approach, and that looks something like this: https://github.com/Hustle/parse-server/commits/concurrent-includes-minimal (note particularly 87d6d91). Does that seem like a preferable approach? I think it's sort of hard to interpret some of the trickiness in replacePointers, but it certainly seems less likely to introduce subtle breakages. Any thoughts?
Have you thought about how to profile this stuff so we can get some before/after benchmarks?
One available option is for us to hold off a bit on merging this into the mainline, and I can merge a copy of it into @Hustle's fork - we have some monitoring set up on our deployment, and I'll check with @TylerBrock but I think we should be able to share some of the results from that here. It's unfortunately not quite a true test - we're still using an older version (though I'm going to work on bringing that up to current later). But this code is pretty much unchanged since then, so I think that any changes we see in our branch should be fairly representative of what we would expect with this patch? It's certainly not perfect, as our workload is probably somewhat atypical, but it would be something!
Thanks for all your time on this @acinader!
@noahsilas
I had really been ignoring the traversal, replacement stuff and just focused on the concurrent queries
I took a quick look at your branch and at this pr but wasn't able to grok it quickly and I don't have time to study it right now. I will give it a try later today if I can.
Is it possible to break the two apart, i.e. will the query concurrency stand alone? And if so, could we merge that at this point.
In any event, I'll try and look at that later and get back to you.
The concern is that two replacers might be going at once, and since they rebuild the tree with their modifications, one might clobber the other. But, thinking about it with a fresh brain this morning I think that the replacement operations are entirely synchronous. I think that in NodeJS that means that they won't yield back to the event loop, and shouldn't be interruptible, right? So, maybe we don't need to worry about that as a prerequisit.
It does sort of rely on a bit of trickyness - it's a little weird that introducing any async boundary anywhere deep in that function would break things, and it wouldn't necessarily work on exotic runtimes. I think that's probably OK!
Having said all that, it feels possible to separate out, I'll spend a little time investigating and testing later today!
I took a stab at replacing the traversal stuff, and that looks like this: https://github.com/Hustle/parse-server/commit/6bbae2842c0e0d13b519c9c82779d303fa5b96a1?w=1
It's definitely tricky, but I could follow up separately to simplify that, and it makes the overall diff for concurrent includes(https://github.com/parse-community/parse-server/compare/master...Hustle:concurrent-includes-only) simpler to read.
How does that feel to folks?
Sorry for the long delay here!
I've updated this PR to reflect the "minimal" branch I had linked to above, and I'll submit a separate PR relating to the traversal stuff later. (For reference, the original commits in the PR are available here: https://github.com/Hustle/parse-server/tree/concurrent-includes-original)
To add a little more confidence to this change, @Hustle spent several days with a patch very close to this one running in production. It's not quite an exact match (we're slowly upgrading from an older release), but it did give me much more confidence in the correctness and performance characteristics.
At Hustle this didn't give us the big performance boost I had been hoping for - it turns out that each query to Mongo is individually pretty quick, and compared to some of our expensive operations like making HTTP requests to other services, the cost of doing the fetches serially is small enough that it's hardly noticeable for our slowest endpoints. Bummer for me, means I need to keep working on finding more places to make other perf wins.
The good news is that our monitoring does show this patch is effective at running the fetches in parallel. Here's excerpts from a trace where we're doing a query similar to new Parse.Query(GoalStep).include('goal', 'messageScript', 'organization').
Before this patch, with serial includes:
After this patch, with parallel includes
These two traces aren't exactly apples-to-apples; they're getting different object sets, and we also changed the version of the mongo driver between them, but I think that this shows that there are some query patterns that can see their performance significantly increase with this patch!
⚠️ Important change for merging PRs from Parse Server 5.0 onwards!
We are planning to release the first beta version of Parse Server 5.0 in October 2021.
If a PR contains a breaking change and is not merged before the beta release of Parse Server 5.0, it cannot be merged until the end of 2022. Instead it has to follow the Deprecation Policy and phase-in breaking changes to be merged during the course of 2022.
One of the most voiced community feedbacks was the demand for predictability in breaking changes to make it easy to upgrade Parse Server. We have made a first step towards this by introducing the Deprecation Policy in February 2021 that assists to phase-in breaking changes, giving developers time to adapt. We will follow-up with the introduction of Release Automation and a branch model that will allow breaking changes only with a new major release, scheduled for the beginning of each calendar year.
We understand that some PRs are a long time in the making and we very much appreciate your contribution. We want to make it easy for PRs that contain a breaking change and were created before the introduction of the Deprecation Policy. These PRs can be merged with a breaking change without being phased-in before the beta release of Parse Server 5.0. We are making this exception because we appreciate that this is a time of transition that requires additional effort from contributors to adapt. We encourage everyone to prepare their PRs until the end of September and account for review time and possible adaptions.
If a PR contains a breaking change and should be merged before the beta release, please mention @parse-community/server-maintenance and we will coordinate with you to merge the PR.
Thanks for your contribution and support during this transition to Parse Server release automation!
Thanks for opening this pull request!
- ❌ Please edit your post and use the provided template when creating a new pull request. This helps everyone to understand your post better and asks for essential information to quicker review the pull request.
@noahsilas Do you think we can get this ready for merge? It looks very promising.
What do we have to do to get this merged @mtrezza? This unit test is 2x as fast with these changes, and these pointers are only one level down:
it('does parellel include', async () => {
const keys = (() => {
const array = [];
for (let i=0; i < 100; i++) {
array.push(`Test${i}`);
}
return array;
})();
const objects = keys.map(key => new Parse.Object(key, {[key]: key}));
await Parse.Object.saveAll(objects);
const test = new Parse.Object('Test');
for (const obj of objects) {
test.set(obj.className, obj);
}
await test.save();
const firstNow = performance.now()
await new Parse.Query('Test').include(keys).first();
console.log(performance.now() - firstNow);
})
Can I start a new PR based on this one considering it is currently stale?
Yeah, I'd suggest opening a new issue to give this some more visibility, maybe we can break this down into smaller PRs if necessary due to complexity. Then open a new PR.
This unit test is 2x as fast with these changes,
Great PR to work on, praise guaranteed for whoever gets this merged; I'll pin this and add a bounty