mongodb-d4
mongodb-d4 copied to clipboard
coarse-grained query split
In our mongodb trace normalizer, we only take fields which is in list type and has more than X elements away from its original document and make the removed field as a new collection. But the field with fewer than X elements will remain in its old document.
In the metadata reconstruction process, we accordingly remove the query that access the removed fields from the original query content and make a new query to the field in the new document. Here we have no way to determine if this query needs to access the field in the new document or the old document.
But given that we have removed the fields with more elements that the query is more likely to access the new document, so we just remove all the queries accessing the removed fields from the old query content and make new ones access the fields in the new documents.
Here is an example: Originally we have a document in coolection: db_group_member
col_name: db_group.member
doc:
{
_id: xxx
students: [
{ name : xx, email: xx@db_group.edu}
{ name : yy, email: yy@db_group.edu}
]
}
But after the normalization, we have no idea if the field "student" is removed from the old document or not. But considering that we remove fields with more elements, letting the query go to the new collection is more likely to hit the data. So if the original query is like:
{collection: db_group_member, query_content: {[{student: {name : xx}}]}
We always remove it from the old query_content and make a new query, like:
{collection: db_group_member__students, query_content: {[{student: {name : xx}}]}
Yeah, this is not accurate but good enough for now.