Support IN query batch split rewrite
Feature Request
Is your feature request related to a problem?
Yes. Currently, ShardingSphere handles INSERT statements with batch value splitting - each value in an INSERT INTO ... VALUES clause is individually routed and then merged based on target data nodes.
However, SELECT queries with IN expressions (e.g., SELECT * FROM t_order WHERE order_id IN (1, 2, 3)) are not handled the same way.
When executing a SELECT query with IN expression on a sharding key, if different values route to different shards, the current implementation sends all IN values to all matched shards, resulting in unnecessary data transfer and processing.
Describe the feature you would like.
Implement IN query batch split rewrite similar to INSERT VALUES handling:
- Parse IN expression values and route each value individually
- Track which values route to which data nodes (like
originalDataNodesfor INSERT) - During SQL rewrite, filter IN values per route unit to only include values that route to that specific shard
Key Components:
- InValueContext: Store IN expression structural information (similar to InsertValueContext)
- ShardingInValuesToken: Token for rewriting IN values based on route unit
- ShardingInValuesTokenGenerator: Generate tokens for IN expressions
- ShardingStandardRouteEngine: Add logic to split IN query conditions
Expected Behavior:
Before (current):
- All shards receive:
SELECT * FROM t_order WHERE order_id IN (1, 2, 3)
After (proposed):
- ds_0 receives:
SELECT * FROM t_order_0 WHERE order_id IN (1) - ds_1 receives:
SELECT * FROM t_order_1 WHERE order_id IN (2, 3)
Benefits:
- Reduced data transfer between shards
- More efficient query execution per shard
- Consistent behavior between INSERT and SELECT sharding optimizations"