feathers icon indicating copy to clipboard operation
feathers copied to clipboard

MongoDB Aggregation improvements

Open DaddyWarbucks opened this issue 2 years ago • 9 comments

THIS PR IS CURRENTLY A WORK IN PROGRESS

This PR's aim is to improve/extend the usage of the aggregation framework to be used on all methods. But, it also solves a number of bugs and improves general performance.

Get Method This method can now use the aggregation pipeline.

Find Method Definitely WIP. Still working on accurate count with pipeline. But, we have accomplished better performance when not using the pipeline. This is mainly accomplished with how filters.$limit === 0 now only counts the docs and doesn't do the actual model.find because it's not needed.

Create Method This method can now use the aggregation pipeline.

Update Method This method can now use the aggregation pipeline. It also solves https://github.com/feathersjs/feathers/issues/3365 and https://github.com/feathersjs/feathers/issues/3363. This is accomplished by using findOneAndReplace instead of replaceOne. Because findOneAndReplace actually returns the updated document, there is no need for the two other lookups.

Patch Method This method can now use the aggregation pipeline. It should also get a performance boost with the usage of findOneAndUpdate when possible. It now supports $sort, $limit, and $skip for multi. I am not sure why these operators were not supported before, but it seems like a valuable query to have access to.

Remove Method This method can now use the aggregation pipeline. It should also get a performance boost with the usage of findOneAndDelete when possible. It now supports $sort, $limit, and $skip for multi. I am not sure why these operators were not supported before, but it seems like a valuable query to have access to.

DaddyWarbucks avatar Dec 05 '23 09:12 DaddyWarbucks

This is looking good.

marshallswain avatar Dec 09 '23 04:12 marshallswain

Some more updates.

  • The findRaw and aggregateRaw methods have been updated to sort before projecting. This should ensure that a user can sort by a property that is not projected.
  • The countDocuments method no longer returns 0 if service.options.paginate = undefined. The user shouldn't have to have the paginate option if they want to count documents. It doesn't rely on pagination options anyway; it counts all the documents by design. It can also now handle counting with the aggregation framework.
  • The _find method has been optimized to better handle when to count and when not to. Specifically, there have been logical branches added for filters.$limit === 0 then just count and don't find, if params.paginate === false just find and don't count.

See further comments in the _find method for some more questions/context.

DaddyWarbucks avatar Dec 09 '23 06:12 DaddyWarbucks

Another note about counting and aggregation. We could finesse the pipleline with a $count stage. We would coerce the result into a "page" with { total: 100, data: [ ... ] } right out of the pipeline instead of just an array of data. This would alleviate the need to do two separate count and find operations in that case. But, it seems complicated and we might bash the user's custom pipeline stages.

DaddyWarbucks avatar Dec 09 '23 07:12 DaddyWarbucks

@marshallswain I believe I am done with this PR. I am currently testing it in a project locally and everything seems to be working as expected!

DaddyWarbucks avatar Jan 24 '24 23:01 DaddyWarbucks

This PR is complete. @marshallswain and @daffl

DaddyWarbucks avatar Apr 25 '24 23:04 DaddyWarbucks

I'm letting @marshallswain have a look at this since he did a lot of that work before. Any idea what's going on with the tests?

daffl avatar Apr 26 '24 15:04 daffl

I am not sure what you mean about the tests. They seem to be working fine for me.

I think it's a good idea to let Marshall look at this too. I do believe this is a major version bump for two reasons. The adapter previously used params.mongo to determine which find method to use. If params.mongo was present it used Model.find and otherwise used Model.aggregate. The adapter now uses params.pipeline to make this decision. If present, it uses Model.aggregate and uses Model.find otherwise. The params.mongo options are now applied to both mongo methods. The _findOrGet method was also removed because it was no longer used.

Some other added features and perf gains to consider

  • The pipeline can be used on all methods
  • Multi patch and remove now support $limit and $sort
  • Better performance for patch, update, and remove. This is accomplished by using the corresponding mongo methods that mutate and return, like findOneAndUpdate. This alleviates the need to re-lookup the record after mutation.
  • Better performance for un-paginated find by not doing an unneeded countDocuments. And a couple other little tweaks around $limit: 0.
  • Removed usage of Feathers adapter commons select function in favor of Mongo's built in projection.

I am also terrible at TS. So there may be some tweaks that need to be made there.

DaddyWarbucks avatar Apr 26 '24 15:04 DaddyWarbucks

I meant that the MongoDB tests are not passing in CI: https://github.com/feathersjs/feathers/actions/runs/8850748298/job/24305731299?pr=3366#step:8:1588

Is there a way to make this more backwards compatible? Otherwise this'd have to wait for v6 - or we start moving things out again already which we were thinking of doing anyway.

daffl avatar Apr 26 '24 15:04 daffl

Oh that's odd. I had just been running only the adapter tests locally and they worked. I hadn't noticed them failing in the CI suite. I can poke around at that later.

As far as backwards compatibility, there are three things I think.

The _findOrGet method could easily be added back. I actually just removed it today as I noticed it was no longer needed.

Using $limit and $sort in patch and remove could be easily removed. I am not sure if those are breaking changes or features. I lean feature.

For the params.pipeline and params.mongo switch, I don't think theres is a way to make it backwards compatible or that we want to. Being able to pass params.mongo to the aggregation function was actually a design goal of the changes. I should have made that more clear in previous posts. Previously you couldn't pass params.mongo to Model.aggregation because it was the "switch" and therefore called Model.find. Having options like collation, etc are valuable to both methods. And the "switch" was already a bit clunky because you then also used params.pipeline to customize aggregation. So the usage is now much clearer.

Previously, using Model.aggregate was the default. You had to pass params.mongo to use Model.find.

1 - If a user does not use params.mongo or params.pipeline. Previously it would have used Model.aggregate, now it uses Model.find. But the same results are returned, so not really a breaking change. 2 - If the user uses params.mongo. Previously it used Model.find, now it still uses Model.Find 3 - If the user uses params.pipeline. Previously it would have used Model.aggregate and it still uses Model.aggregate. 4 - If the user uses both params. Previously it would have used Model.find and ignored the params.pipeline, which is unexpected. Because they also used params.mongo it switched to find. Now it uses the appropriate Model.aggregate but also uses the params.mongo options.

So 1 works differently but shouldn't break anything. And 4 works differently, but previously it didn't work as expected.

So maybe it's not a breaking change?

DaddyWarbucks avatar Apr 26 '24 16:04 DaddyWarbucks

I finally got around to fixing the broken test and added a few more. I think this is ready for @marshallswain to look at.

DaddyWarbucks avatar May 24 '24 14:05 DaddyWarbucks

Excellent, thank you for putting this together!

daffl avatar May 29 '24 21:05 daffl