Support IN query batch split rewrite

Open wy1433 opened this issue 1 month ago • 0 comments

Feature Request

Is your feature request related to a problem?

Yes. Currently, ShardingSphere handles INSERT statements with batch value splitting - each value in an INSERT INTO ... VALUES clause is individually routed and then merged based on target data nodes. However, SELECT queries with IN expressions (e.g., SELECT * FROM t_order WHERE order_id IN (1, 2, 3)) are not handled the same way.

When executing a SELECT query with IN expression on a sharding key, if different values route to different shards, the current implementation sends all IN values to all matched shards, resulting in unnecessary data transfer and processing.

Describe the feature you would like.

Implement IN query batch split rewrite similar to INSERT VALUES handling:

Parse IN expression values and route each value individually
Track which values route to which data nodes (like originalDataNodes for INSERT)
During SQL rewrite, filter IN values per route unit to only include values that route to that specific shard

Key Components:

InValueContext: Store IN expression structural information (similar to InsertValueContext)
ShardingInValuesToken: Token for rewriting IN values based on route unit
ShardingInValuesTokenGenerator: Generate tokens for IN expressions
ShardingStandardRouteEngine: Add logic to split IN query conditions

Expected Behavior:

Before (current):

All shards receive: SELECT * FROM t_order WHERE order_id IN (1, 2, 3)

After (proposed):

ds_0 receives: SELECT * FROM t_order_0 WHERE order_id IN (1)
ds_1 receives: SELECT * FROM t_order_1 WHERE order_id IN (2, 3)

Benefits:

Reduced data transfer between shards
More efficient query execution per shard
Consistent behavior between INSERT and SELECT sharding optimizations"

Dec 08 '25 02:12 wy1433