mongodb-d4
mongodb-d4 copied to clipboard
Precompute Denormalized Workload Before Invoking Cost Model
When we decide to denormalize one collection into another, we need to combine operations together if they access the child collection and the parent collection in the same session. That is, if we choose to denormalize collection A into collection B, and if there is an operation op1 that accesses A and another operation op2 that accesses B in the same session, then op1 needs to be combined with op2 when we perform our cost model calculations.
We currently try to do this in the NetworkCostComponent but it's a big hack and it's not reusable by the other cost model components.
Instead, the CostModel class should modify the workload according to the given design before invoking the individual component classes so that we only do it once.
I would make a separate class called WorkloadCombiner in src/workload that will do all of this work. This will allow us to keep track of what operations need to be combined for each new design based on the previous design.
- We should build indexes inside of
WorkloadCombinerso that we can quickly identify which operations need to be modified for each denormalization pair. WorkloadCombinerwill need to maintain the original workload so that we know how to undo changes if go from denormalized to normalized.- We need to discuss the different rules for what operations to be combined. For example, we cannot combine aggregate operations. If there are multiple operations on the denormalized collection, we will to think about whether they can be combined or not.