Implement Replay Benchmark

Open apavlo opened this issue 13 years ago • 0 comments

The main part of our experimental analysis section in the paper will be the performance measurements that show that the designs generated by our tool outperform the human-generated designs (i.e., Spencer) and the simple heuristic-generated designs (i.e., InitialDesigner).

The basic idea of the replay benchmark is that we are going to be given a design file (i.e., Design) and we want to run the sample workload on the database using that design. The benchmark code will need to automatically setup the proper indexes and sharding keys (as defined in the design file) and then perform any denormalization operations as according to the design. Since we don't have a lot of time, we won't bother with trying to automatically expand the sample data set into a larger database (we may need to do this for ex.fm data set, but we can come up with some quick to do this).

Here are the rough sketch of how this will work:

The config file needs to define the following:
- A design file
- A MongoDB database that contains the sample workload trace (e.g., metadata.sessions). This should be on a different machine than the ones that are going to be used in the experiments. You can run this on the machine where the coordinator will run.
- A MongoDB database that contains the sample data set.
The coordinator will read in a design (as defined in the config file) and perform the proper index/sharding key configuration operations for the database. It will then transform the sample workload to combine (or split) operations based on the denormalization scheme defined in the design. This new workload should be saved off into another collection.
When the benchmark starts running, each worker will pull down the new workload generated in the previous step. Then for each next() invocation, it will select a random session and execute all of its operations successively. We don't need to worry about two workers choosing the same session. We will need to check whether we need to reload the sample database each time.

Oct 22 '12 19:10 apavlo