mongodb-d4 icon indicating copy to clipboard operation
mongodb-d4 copied to clipboard

Determine whether we need unique index keys

Open apavlo opened this issue 13 years ago • 0 comments
trafficstars

In order to improve the accuracy of estimating what documents an operation will need to access/modify in the disk cost component, we may want to require that the user provides us with the list of unique index keys for their database. I previously assumed that we were going to be able to determine automatically from the reconstructed database what keys in the collection are unique, but these may just be false positives. I also note that we will have the index information from the MySQL workloads, so it might be ok to request the same for MongoDB.

The main advantage is that I think we will improve the accuracy of our predictions for what documents each operation will access because we will be able to discard most of the superfluous keys instead of using catalog.getAllValues().

If we decide to include this, then there are two things that we need:

  1. We will need to come up with an easy way for the user to input this information to us. It should probably go in the configuration file. It could be a utility script that simply dumps out the results of db.system.indexes.find().

  2. We will need to add a new indexes entry to catalog.Collection. This should be a list of inner dicts (like fields) that contain the following attributes:

    'indexes': {
      unicode: {
         'keys':   [ basestring ],
         'sparse': bool,
         'unique': bool,
      }
    }
    

apavlo avatar Sep 20 '12 20:09 apavlo