nestjs-query
nestjs-query copied to clipboard
Using batchQueries to solve the n+1 problem (mongo)
I was doing some testing of quering by reference and noticed that there is no solution for the n+1 problem in mongoose/typegoose. I saw this issue (https://github.com/doug-martin/nestjs-query/issues/356) which explains the exact same problem but in typeorm, this has been fixed with the batchQueryRelations function (https://github.com/doug-martin/nestjs-query/blob/fb6dc5455a412f16acc10c436881bf6fd139b34e/packages/query-sequelize/src/services/relation-query.service.ts#L343) in relation-query.service.ts of the type-orm package.
It would be nice to have something similar in mongo, otherwise queries with references will get expensive quickly.
Example case
{accommodations { nodes { id name features { name category { name } } } }}
- would do 1 query for accommodations
- would do 1 query for each accommodation features
- would do 1 query for each accommodation feature category
For the time being the documentation at https://doug-martin.github.io/nestjs-query/docs/introduction/getting-started should probably also state: "Built in dataloader avoiding the n+1 problem (n+1 is only fixen in typeorm atm)" instead of "Built in dataloader avoiding the n+1 problem"
@ruudvanbuul good point on calling out the batch optimization with typeorm, the other adapters are a lot harder to do this with. I'll add a note to the documentation.
@doug-martin What makes it so much harder in Mongo? I might try and fabricate a PR if we need this functionality in mongo
@ruudvanbuul it's been a while since we implemented it, but the challenge is around getting the unique result set for each entity in the batch query.
In typeorm we can drop down to the raw sql and do a union for each query, the reason we need to do the union is so we can sort, page, and filter independently. Here is a short psuedo sql example:
SELECT * FROM (
(SELECT sub_tasks.*, todo_items.id as `__nestjs_query_todo_item_id__` FROM sub_tasks join todo_items where sub_tasks.todo_item_id = todo_items.id WHERE todo_items.id = 1 ORDER BY sub_tasks.title DESC LIMIT 1 OFFSET 10)
UNION
(SELECT sub_tasks.*, todo_items.id as `__nestjs_query_todo_item_id__` FROM sub_tasks join todo_items where sub_tasks.todo_item_id = todo_items.id WHERE todo_items.id = 2 ORDER BY sub_tasks.title DESC LIMIT 1 OFFSET 10)
UNION
(SELECT sub_tasks.*, todo_items.id as `__nestjs_query_todo_item_id__` FROM sub_tasks join todo_items where sub_tasks.todo_item_id = todo_items.id WHERE todo_items.id = 3 ORDER BY sub_tasks.title DESC LIMIT 1 OFFSET 10)
) AS unioned_query;
Notice how each query is filtered, sorted, and paged independently then unioned to create the full result set. In typeorm I also add a unique entity identifier (e.g. __nestjs_query_todo_item_id__
) so I can map the result set back to the correct entity.
There is probably a way to do this in mongo and if you know how I would love a PR!
My 2 cents.
Aside from the problem of resolving refs, trying to normalize data like in a relational database isn't always the wise choice in No-SQL and often denormalizing is better for query optimization. For example, the given structure of accommodations to features to categories is could be considered one level too deep. Categories can be stored in the Features for sure. Storing a proper category or categories within a feature can be controlled on the insertion side of features. So, you'd still need a Categories collection to bump up the entries against. There just wouldn't be any references. And, if you did need references to categories, you'd probably want a many-to-many junction collection for that, because many features (or other entities) could fall under multiple categories. Or, you'd need an array of category references within each entity. There are multiple ways to Rome for this, which doesn't help make the solution straightforward.
All that said, I believe, either population or aggregation (with $unionWith
) could be the answer to getting referenced documents resolved. They handle both the n+1 issue albeit in different ways. I believe they both can handle ordering and limiting. The only thing I'd need to know/ need to test is if $unionWith
can be used with the same collection. I haven't used it myself, so not sure and the examples I've seen all union on different collections.
So, tl;dr; it should be possible to resolve refs, I'm just not sure of the possible solution and it working for all necessary data modeling scenarios. It might require specific modeling, which might be too restrictive. It definitely needs some extra contemplation.
The other question I'd have is where could or can even DataLoader fit in? Instead of trying solve the data merging within the database, couldn't it be done on the resolver side and let GraphQL do the merging?
Scott
Sorry for the late reply gents!
@smolinari I guess the question of where the dataloader fits in is a good one. I also expected the resolver to load all main entities and then make a key-value batch per reference (and map those in the service/resolver), but I see this could get problematic when we want to sort/filter (although only if the criteria are based on the parent entity or with multiple references to the same entity I believe) and especially paging references (when different parent entity instances have different reference instances, paging gets complicated)
The aggregate/populate route could maybe provide a possible solution for this problem in mongo indeed.
Another option could be to handle a common use case (simple references by id where no paging is needed, i.e. type, tags, categories, etc.) by handling UnpagedRelation (with only 1 reference to the entity in question) in a diffent way than other relations. Because then you can use the dataloader key look up pattern without issue.
PS. The example I gave here was more of a hypothetical one and in our actual codebase we will probably denormalize for now.
I have tested this in our current code base and @smolinari your suggested solution seems to work perfectly:
[
{
$match: {
location: '1234',
},
},
{
$facet: {
count: [{ $count: 'total' }],
accommodations: [
{ $project: { id: 1, location: 1, name: 1, rooms: 1 } },
{ $sort: { _id: 1 } },
{ $limit: 10 },
{
$lookup: {
from: 'accommodationroomfeatures',
localField: 'rooms.features',
foreignField: '_id',
as: 'rooms.features',
pipeline: [
{ $sort: { otherId: 1 } },
{ $limit: 5 },
{ $skip: 2 },
],
},
},
],
},
}
]
This mongo aggregation does exactly what you would need: load accommodations with sort/filter/paging and total count and "join" the room features and sort/page/filter per instance (all in a single query)
To implement this in the mongo query service we would need to change mongo.find() calls to mongo.aggregate() and add $lookups for each relation.
What do you guys think of this?
That looks pretty good. Question is, does it fit for all or at least most needed querying?
Scott
@smolinari Inside the lookup pipeline you could do any $match query which is similar to what you would use in mongo.find(), so I would assume all queries would be covered. I don't have much experience with the internals of nestjs query though.