ravendb RavenDB-17793 - Allowing to control which documents will on which shards using prefixes

Issue link

https://issues.hibernatingrhinos.com/issue/RavenDB-17793

Additional description

Right now this is just to centralize the sharding code to allow for easier prefix handling.

Apr 17 '22 11:04 ayende

This is not done yet, but I would like a review on the approach anyway at this time.

Apr 19 '22 14:04 ayende

It looks good to me

Apr 20 '22 11:04 arekpalinski

Some design notes that we need to consider.

Here is what the sharding configuration now looks:

{
  "DatabaseName": "CanShardByDocumentsPrefix_1",
  "DatabaseState": "Normal",
  "Topology": null,
  "Sharding": {
    "Shards": [
      {
        "Members": ["A"],
        "ReplicationFactor": 1,
        "NodesModifiedAt": "2022-04-21T11:55:27.9530548Z",
        "DatabaseTopologyIdBase64": "vjZhXm8GdUacnyqbN3EHCQ",
        "ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
      },
      {
        "Members": ["A"],
        "ReplicationFactor": 1,
        "NodesModifiedAt": "2022-04-21T11:55:28.9934496Z",
        "DatabaseTopologyIdBase64": "azeloIZvI0i4zBaabTK5lw",
        "ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
      },
      {
        "Members": ["A"],
        "ReplicationFactor": 1,
        "NodesModifiedAt": "2022-04-21T11:55:29.5154517Z",
        "DatabaseTopologyIdBase64": "vTqML1YryEO2aTEKGszIqw",
        "ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
      }
    ],
    "ShardBucketRanges": [
      { "BucketRangeStart": 0, "ShardNumber": 0 },
      { "BucketRangeStart": 349525, "ShardNumber": 1 },
      { "BucketRangeStart": 699050, "ShardNumber": 2 }
    ],
    "Prefixed": {
      "eu/": [{ "BucketRangeStart": 0, "ShardNumber": 0 }],
      "asia/": [
        { "BucketRangeStart": 0, "ShardNumber": 1 },
        { "BucketRangeStart": 524288, "ShardNumber": 2 }
      ]
    },
    "ShardBucketMigrations": {},
    "MigrationCutOffIndex": 0,
    "ShardedDatabaseId": "CWJCLJK2yEOdc9xfvqMjzQ",
    "NumberOfShards": 3
  },
  "ConflictSolverConfig": null,
  
}

What is this? The key is here:

 "ShardBucketRanges": [
      { "BucketRangeStart": 0, "ShardNumber": 0 },
      { "BucketRangeStart": 349525, "ShardNumber": 1 },
      { "BucketRangeStart": 699050, "ShardNumber": 2 }
    ],
    "Prefixed": {
      "eu/": [{ "BucketRangeStart": 0, "ShardNumber": 0 }],
      "asia/": [
        { "BucketRangeStart": 0, "ShardNumber": 1 },
        { "BucketRangeStart": 524288, "ShardNumber": 2 }
      ]
    },

What this configuration does is to say the following:

Documents starting with eu/ will go only to shard 0.
Documents starting with asia/ will go only to shard 1 or 2.
All other documents are spread equally.

This is done at the level of resolving a shard for a document id, so will apply globally.

However, we need to keep in mind:

The buckets are the same - so that means that we may have complexity keeping track of bucket sizes
How will this play with the bucket migration?

One option that comes to mind is to assign different ranges. In the example above, we may store the bucket for eu/ as 1M - 2M, for asia/ as 2M - 3M, etc.

I don't really like that option, though.

Apr 21 '22 12:04 ayende

New PR: https://github.com/ravendb/ravendb/pull/15051

Sep 30 '22 11:09 ppekrol