ravendb icon indicating copy to clipboard operation
ravendb copied to clipboard

RavenDB-17793 - Allowing to control which documents will on which shards using prefixes

Open ayende opened this issue 4 years ago • 3 comments

Issue link

https://issues.hibernatingrhinos.com/issue/RavenDB-17793

Additional description

Right now this is just to centralize the sharding code to allow for easier prefix handling.

ayende avatar Apr 17 '22 11:04 ayende

This is not done yet, but I would like a review on the approach anyway at this time.

ayende avatar Apr 19 '22 14:04 ayende

It looks good to me

arekpalinski avatar Apr 20 '22 11:04 arekpalinski

Some design notes that we need to consider.

Here is what the sharding configuration now looks:

{
  "DatabaseName": "CanShardByDocumentsPrefix_1",
  "DatabaseState": "Normal",
  "Topology": null,
  "Sharding": {
    "Shards": [
      {
        "Members": ["A"],
        "ReplicationFactor": 1,
        "NodesModifiedAt": "2022-04-21T11:55:27.9530548Z",
        "DatabaseTopologyIdBase64": "vjZhXm8GdUacnyqbN3EHCQ",
        "ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
      },
      {
        "Members": ["A"],
        "ReplicationFactor": 1,
        "NodesModifiedAt": "2022-04-21T11:55:28.9934496Z",
        "DatabaseTopologyIdBase64": "azeloIZvI0i4zBaabTK5lw",
        "ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
      },
      {
        "Members": ["A"],
        "ReplicationFactor": 1,
        "NodesModifiedAt": "2022-04-21T11:55:29.5154517Z",
        "DatabaseTopologyIdBase64": "vTqML1YryEO2aTEKGszIqw",
        "ClusterTransactionIdBase64": "IVdIasO+rE+/pK3M6mUCiA"
      }
    ],
    "ShardBucketRanges": [
      { "BucketRangeStart": 0, "ShardNumber": 0 },
      { "BucketRangeStart": 349525, "ShardNumber": 1 },
      { "BucketRangeStart": 699050, "ShardNumber": 2 }
    ],
    "Prefixed": {
      "eu/": [{ "BucketRangeStart": 0, "ShardNumber": 0 }],
      "asia/": [
        { "BucketRangeStart": 0, "ShardNumber": 1 },
        { "BucketRangeStart": 524288, "ShardNumber": 2 }
      ]
    },
    "ShardBucketMigrations": {},
    "MigrationCutOffIndex": 0,
    "ShardedDatabaseId": "CWJCLJK2yEOdc9xfvqMjzQ",
    "NumberOfShards": 3
  },
  "ConflictSolverConfig": null,
  
}

What is this? The key is here:

 "ShardBucketRanges": [
      { "BucketRangeStart": 0, "ShardNumber": 0 },
      { "BucketRangeStart": 349525, "ShardNumber": 1 },
      { "BucketRangeStart": 699050, "ShardNumber": 2 }
    ],
    "Prefixed": {
      "eu/": [{ "BucketRangeStart": 0, "ShardNumber": 0 }],
      "asia/": [
        { "BucketRangeStart": 0, "ShardNumber": 1 },
        { "BucketRangeStart": 524288, "ShardNumber": 2 }
      ]
    },

What this configuration does is to say the following:

  • Documents starting with eu/ will go only to shard 0.
  • Documents starting with asia/ will go only to shard 1 or 2.
  • All other documents are spread equally.

This is done at the level of resolving a shard for a document id, so will apply globally.

However, we need to keep in mind:

  • The buckets are the same - so that means that we may have complexity keeping track of bucket sizes
  • How will this play with the bucket migration?

One option that comes to mind is to assign different ranges. In the example above, we may store the bucket for eu/ as 1M - 2M, for asia/ as 2M - 3M, etc.

I don't really like that option, though.

ayende avatar Apr 21 '22 12:04 ayende

New PR: https://github.com/ravendb/ravendb/pull/15051

ppekrol avatar Sep 30 '22 11:09 ppekrol