mongodb-d4
mongodb-d4 copied to clipboard
Determine whether we need unique index keys
In order to improve the accuracy of estimating what documents an operation will need to access/modify in the disk cost component, we may want to require that the user provides us with the list of unique index keys for their database. I previously assumed that we were going to be able to determine automatically from the reconstructed database what keys in the collection are unique, but these may just be false positives. I also note that we will have the index information from the MySQL workloads, so it might be ok to request the same for MongoDB.
The main advantage is that I think we will improve the accuracy of our predictions for what documents each operation will access because we will be able to discard most of the superfluous keys instead of using catalog.getAllValues().
If we decide to include this, then there are two things that we need:
-
We will need to come up with an easy way for the user to input this information to us. It should probably go in the configuration file. It could be a utility script that simply dumps out the results of
db.system.indexes.find(). -
We will need to add a new
indexesentry to catalog.Collection. This should be a list of inner dicts (likefields) that contain the following attributes:'indexes': { unicode: { 'keys': [ basestring ], 'sparse': bool, 'unique': bool, } }