mongodb-d4 icon indicating copy to clipboard operation
mongodb-d4 copied to clipboard

Precompute Denormalized Workload Before Invoking Cost Model

Open apavlo opened this issue 13 years ago • 0 comments
trafficstars

When we decide to denormalize one collection into another, we need to combine operations together if they access the child collection and the parent collection in the same session. That is, if we choose to denormalize collection A into collection B, and if there is an operation op1 that accesses A and another operation op2 that accesses B in the same session, then op1 needs to be combined with op2 when we perform our cost model calculations.

We currently try to do this in the NetworkCostComponent but it's a big hack and it's not reusable by the other cost model components.

Instead, the CostModel class should modify the workload according to the given design before invoking the individual component classes so that we only do it once.

I would make a separate class called WorkloadCombiner in src/workload that will do all of this work. This will allow us to keep track of what operations need to be combined for each new design based on the previous design.

  1. We should build indexes inside of WorkloadCombiner so that we can quickly identify which operations need to be modified for each denormalization pair.
  2. WorkloadCombiner will need to maintain the original workload so that we know how to undo changes if go from denormalized to normalized.
  3. We need to discuss the different rules for what operations to be combined. For example, we cannot combine aggregate operations. If there are multiple operations on the denormalized collection, we will to think about whether they can be combined or not.

apavlo avatar Sep 27 '12 17:09 apavlo