mongoid icon indicating copy to clipboard operation
mongoid copied to clipboard

MONGOID-5030 - Skip contradicting queries

Open johnnyshields opened this issue 2 years ago • 8 comments

(This PR is currently a Proof-of-Concept.)

What?

This PR implements short-circuit logic for the $in, $nin, and equality operators, in cases where the query condition deterministically results in a "none" result.

Why?

When we upgraded to MongoDB 7.0.2 from 6.x, we discovered the hard way (~4 hours of downtime) that the new Slot-Based Query Execution Engine (SBQEE) behaves differently than the Classic Engine.

In particular, we had a query with a shape like this in a collection with 250M+ records:

Reservation.where(start_time: t1..t2).lt(end_time: t2).any_in(table_ids: [n1, n2]).sort(shop_id: 1, start_time: 1)

Previously, when table_ids: { "$in" => [] } (empty array) was present, the Classic Engine was optimized to skip the query. However, the SBQEE does a full collection scan. If these queries pile-up, database CPU goes to 100% and its game over. We believe this behavior is a bug; SERVER ticket coming soon.

Solution

In MONGOID-5030 which I raised in 2020, I proposed to short-circuit these queries. ActiveRecord 6.1 did this in this PR. At the time the MongoDB answered that the query engine was optimized to skip the queries, but I think the introduction of the SBQEE makes this more urgent.

I considered 3 possible approaches:

  1. Implement the skip in Criteria.in method, at the time conditions are added to the selector.
  2. Implement a filter in Mongoid::Contextual::Mongo before the query goes to the driver, and detect contradictions in the final query.
  3. Similar to #2 but do in Mongo Ruby Driver.

Option #1 has the disadvantage that it doesn't cover cases like Person.where(name: { "$in" => [] }), so I ruled it out. Option #3 I investigated, but it would require introducing the concept of a "Null Operation" to the driver, which increases complexity. Option #2 we already have a precedent with Mongoid::Contextual::None so it might be the best approach.

This PR is currently a POC to illustrate where I will insert the logic. I will add a feature flag for this, however, I believe it is safe to implement.

Truth Tables

For reference. These tables illustrate the logic that can be short circuited. This PR will only cover the none case.

$in

Expression Boolean Equivalent Can Short-circuit?
$in => [1, 2] ? No
$in => [] F Yes (same as none)
$in => [1, 2], $in => [] ? && F == F Yes (same as none)
$in => [], $in => [] F && F == F Yes (same as none)
$not => { $in => [1, 2], $in => [] } !(? && F) == T Yes (same as all)
$not => { $in => [], $in => [] } !(F && F) == T Yes (same as all)

$nin

Expression Boolean Equivalent Can Short-circuit?
$nin => [1, 2] ? No
$nin => [] T Yes (same as all)
$nin => [1, 2], $nin => [] ? && T == ? No
$nin => [], $nin => [] T && T == T Yes (same as all)
$not => { $nin => [] } !T == F Yes (same as none)
$not => { $nin => [1, 2], $nin => [] } !(? && T) == ? No
$not => { $nin => [], $nin => [] } !(T && T) == F Yes (same as none)

johnnyshields avatar Oct 18 '23 12:10 johnnyshields

@jamis I'd appreciate your review on what I'm trying to achieve here. I'm glad to code this to completion.

johnnyshields avatar Oct 18 '23 12:10 johnnyshields

@johnnyshields Thanks for raising this, Johnny. We're discussing it internally and will let you know. In the meantime, as mentioned in the HELP ticket you raised, it sounds like this server bug was fixed in the 7.0.3 patch release.

jamis avatar Oct 18 '23 15:10 jamis

@jamis any update on the direction here? We consider this issue pretty critical after our experience with SERVER-79088

johnnyshields avatar Nov 05 '23 16:11 johnnyshields

Hi @johnnyshields, and thanks for raising this PR. The initial motivation for MONGOID-5030 was to recreate an optimization from AR6.1 that would prevent certain query shapes from being sent to the server.

Mongoid attempts to match Rails' APIs wherever possible, and incorporate optimizations where appropriate. As there were users reporting issues with the approach in AR6.1, we want to ensure we account for similar potential challenges before incorporating short-circuit logic.

Though we appreciate there was a recent issue (SERVER-79088), this PR shouldn't be considered a "bug fix" as the underlying bug should already be addressed (and will be included in the upcoming 7.0.3 server release).

This PR/ticket is still on our radar, however we want to be extremely careful as to how we approach logic changes that silently alter the intent of a user operation.

alexbevi avatar Nov 07 '23 14:11 alexbevi

@alexbevi It is fine to primarily consider this a "latency optimization" and also "ActiveRecord parity".

(That being said, no query == zero possibility of a server bug 😄)

johnnyshields avatar Nov 07 '23 14:11 johnnyshields

@alexbevi

We want to ensure we account for similar potential challenges before incorporating short-circuit logic.

The AR issue you referenced wouldn't happen from the proposed change in this PR. Here we will leverage the existing Mongo::Contextual::None abstraction which represents a non-executable query, and things like count, sum etc. are already well-defined/tested to return the appropriate result (e.g. 0 in the case of count and sum.)

Of course we always need to be careful with every change, but would like to paint the picture that this change is less risky than you might think.

johnnyshields avatar Nov 08 '23 14:11 johnnyshields

Another case which can be handled client side is $in with a nested $ operator -- currently it raises a server-side error; this error could be moved client side.

Mongo::Error::OperationFailure 
[2:BadValue]: cannot nest $ under $in

johnnyshields avatar Nov 11 '23 12:11 johnnyshields

@johnnyshields I've added more detail to MONGOID-5030 regarding where this feature fits on the roadmap at the moment.

alexbevi avatar Nov 14 '23 14:11 alexbevi