shardingsphere icon indicating copy to clipboard operation
shardingsphere copied to clipboard

Enhance autoTables API to support more flexible sharding rules

Open strongduanmu opened this issue 1 year ago • 9 comments
trafficstars

Feature Request

Is your feature request related to a problem?

#33341

Describe the feature you would like.

In #33341, We temporarily removed the ShardingRouteAlgorithmException check logic to support storing different sharding tables in different database, but this may affect the normal check logic, for example: there is a problem with the sharding algorithm itself, resulting in routing to non-existent nodes.

A better way to support this is to refer to the existing autoTables usage and allow users to configure actualDataNodes. It is only necessary to ensure that the actual table name in actualDataNodes is globally unique, so that the sharding algorithm can directly find the corresponding database and table information based on actual table name. The traditional databaseStrategy and tableStrategy add more stringent configuration checks and do not allow irregular table sharding when it's under database sharding, because table sharding is based on database sharding logic.

The new configuration might look like this:

- !SHARDING
  autoTables:
    t_order:
      actualDataNodes: ds_${0}.t_order_${0..3},ds_${1}.t_order_${4..7}
      keyGenerateStrategy:
        column: order_id
        keyGeneratorName: t_order_snowflake
      logicTable: t_order
      shardingStrategy:
        standard:
          shardingAlgorithmName: t_order_mod
          shardingColumn: order_id

In addition, one benefit I can think of from this change is that the actualDataNodes of the existing autoTables are automatically generated by ShardingRule. We can also consider maintaining the automatically generated actualDataNodes in the newly added API, so that the data distribution is known to users. On this basis, we can remove the standard sharding algorithm and the automatic sharding algorithm from the existing sharding algorithms, and all algorithms can be universal because they only need to route the actualDataNodes.

For different sharding algorithm type, you can refer this doc - https://shardingsphere.apache.org/document/current/en/user-manual/common-config/builtin-algorithm/sharding/

Tasks:

  • [ ] Add new actualDataNodes yaml configuration
  • [ ] Add new actualDataNodes for DISTSQL
  • [ ] Init sharding and table rule according to new api
  • [ ] Adapte sharding sql route logic for autoTables actualDataNodes
  • [ ] Enhace configuration check logic when table sharding based on database sharding(keep the same actual table name)
  • [ ] Adjust autoTables actualDataSources expand to DistSQL and yaml handle logic(only persist actualDataNodes to zk instead of actualDataSources)
  • [ ] Remove ShardingAutoTableAlgorithm interface
  • [ ] Modify related doc
  • [ ] Add more unit test and e2e test

strongduanmu avatar Oct 23 '24 07:10 strongduanmu

Hi @strongduanmu I am willing to work on this Issue.

Yash-cor avatar Nov 05 '24 07:11 Yash-cor

Hi @Yash-cor, this task is somewhat difficult. Are you familiar with the sharding feature?

strongduanmu avatar Nov 05 '24 23:11 strongduanmu

Yes, I am familiar with Autotables and have reviewed the documentation of ShardingSphere. I plan to deepen my understanding of its functionality within ShardingSphere and I will begin working on this issue accordingly.

Hi @Yash-cor, this task is somewhat difficult. Are you familiar with the sharding feature?

Yash-cor avatar Nov 06 '24 06:11 Yash-cor

@Yash-cor This sounds great, and you can organize the details of the code changes first, which will help ensure that you are working in the right direction.

strongduanmu avatar Nov 06 '24 07:11 strongduanmu

@Yash-cor This sounds great, and you can organize the details of the code changes first, which will help ensure that you are working in the right direction.

Hello @strongduanmu sorry for late response I want to make sure that we have to keep actualDataSources and add the new 'actualDataNodes' as if we delete actualDataSources it becomes similar configuration to normal sharding rule.

Yash-cor avatar Nov 26 '24 08:11 Yash-cor

@Yash-cor The ultimate goal is to use actualDataSources as syntax sugar to convert it to actualDataNodes in advance. I think there should only be actualDataNodes in the metadata in the end.

strongduanmu avatar Dec 03 '24 08:12 strongduanmu

@Yash-cor The ultimate goal is to use actualDataSources as syntax sugar to convert it to actualDataNodes in advance. I think there should only be actualDataNodes in the metadata in the end.

But in our config yaml file while using autoTable we used actualDataSources and that was converted internally to actualDataNodes

And now we have to change the autoTable Rule configuration and instead of writing actualDataSources we have to use actualDataNodes in yaml config files.

This would make the autoTable rule same as sharding rule.

Yash-cor avatar Dec 03 '24 09:12 Yash-cor

@Yash-cor Can you give an example to illustrate your question? Ultimately, what we want to achieve is two sets of YAML APIs and two sets of DistSQL, but autoTables is just a syntax sugar for tables, and actualDataSources is also a syntax sugar for actualDataNodes. We need to convert it to tables and actualDataNodes before persisting metadata and initializing ShardingRule. Because it only exists as syntax sugar, when we SHOW SHARDING RULE, the configuration displayed should be the corresponding results of tables and actualDataNodes.

strongduanmu avatar Dec 03 '24 09:12 strongduanmu

Hello @strongduanmu I recently revisited this issue and raised PR #34622.

I added the actualDataNodes configuration for auto tables and made the necessary changes to the routeContext. However, I need some clarification on the expected behavior of ModShardingAlgorithm and HashModShardingAlgorithm with respect to actualDataNodes.

Currently, I have kept the working of both the Mod and HashMod algorithms the same. However, during routing, when auto tables are used, I set the database sharding strategy similar to the table sharding strategy. This ensures that a single data node is selected during the insert operation.

Could you please confirm if this approach is correct?

Yash-cor avatar Feb 11 '25 09:02 Yash-cor