RavenDB-17793 - Allowing to control which documents will on which shards using prefixes
Issue link
https://issues.hibernatingrhinos.com/issue/RavenDB-17793
Additional description
Right now this is just to centralize the sharding code to allow for easier prefix handling.
This is not done yet, but I would like a review on the approach anyway at this time.
It looks good to me
Some design notes that we need to consider.
Here is what the sharding configuration now looks:
{
"DatabaseName": "CanShardByDocumentsPrefix_1",
"DatabaseState": "Normal",
"Topology": null,
"Sharding": {
"Shards": [
{
"Members": ["A"],
"ReplicationFactor": 1,
"NodesModifiedAt": "2022-04-21T11:55:27.9530548Z",
"DatabaseTopologyIdBase64": "vjZhXm8GdUacnyqbN3EHCQ",
"ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
},
{
"Members": ["A"],
"ReplicationFactor": 1,
"NodesModifiedAt": "2022-04-21T11:55:28.9934496Z",
"DatabaseTopologyIdBase64": "azeloIZvI0i4zBaabTK5lw",
"ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
},
{
"Members": ["A"],
"ReplicationFactor": 1,
"NodesModifiedAt": "2022-04-21T11:55:29.5154517Z",
"DatabaseTopologyIdBase64": "vTqML1YryEO2aTEKGszIqw",
"ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
}
],
"ShardBucketRanges": [
{ "BucketRangeStart": 0, "ShardNumber": 0 },
{ "BucketRangeStart": 349525, "ShardNumber": 1 },
{ "BucketRangeStart": 699050, "ShardNumber": 2 }
],
"Prefixed": {
"eu/": [{ "BucketRangeStart": 0, "ShardNumber": 0 }],
"asia/": [
{ "BucketRangeStart": 0, "ShardNumber": 1 },
{ "BucketRangeStart": 524288, "ShardNumber": 2 }
]
},
"ShardBucketMigrations": {},
"MigrationCutOffIndex": 0,
"ShardedDatabaseId": "CWJCLJK2yEOdc9xfvqMjzQ",
"NumberOfShards": 3
},
"ConflictSolverConfig": null,
}
What is this? The key is here:
"ShardBucketRanges": [
{ "BucketRangeStart": 0, "ShardNumber": 0 },
{ "BucketRangeStart": 349525, "ShardNumber": 1 },
{ "BucketRangeStart": 699050, "ShardNumber": 2 }
],
"Prefixed": {
"eu/": [{ "BucketRangeStart": 0, "ShardNumber": 0 }],
"asia/": [
{ "BucketRangeStart": 0, "ShardNumber": 1 },
{ "BucketRangeStart": 524288, "ShardNumber": 2 }
]
},
What this configuration does is to say the following:
- Documents starting with
eu/will go only to shard 0. - Documents starting with
asia/will go only to shard 1 or 2. - All other documents are spread equally.
This is done at the level of resolving a shard for a document id, so will apply globally.
However, we need to keep in mind:
- The buckets are the same - so that means that we may have complexity keeping track of bucket sizes
- How will this play with the bucket migration?
One option that comes to mind is to assign different ranges. In the example above, we may store the bucket for eu/ as 1M - 2M, for asia/ as 2M - 3M, etc.
I don't really like that option, though.
New PR: https://github.com/ravendb/ravendb/pull/15051