Brighter icon indicating copy to clipboard operation
Brighter copied to clipboard

DynamoDB outbox sharding prevents guaranteed message ordering

Open dhickie opened this issue 8 months ago • 0 comments

Is your feature request related to a problem? Please describe. When messages are added to the Dynamo DB outbox, they are assigned a random "shard" for their partition key, in order to avoid hot partitions for particular topics. When the sweeper queries the outbox for outstanding messages, it iterates through all the available shards. This makes it impossible to guarantee message ordering for any given partition key, as messages for that key will be randomly distributed between the different shards.

Describe the solution you'd like Instead of assigning messages to shards randomly, use a deterministic hashing algorithm that distributes messages evenly between shards while guaranteeing that all messages for a particular partition key are assigned to a particular shard:

var keyBytes = Encoding.UTF8.GetBytes(message.Header.PartitionKey);
var sha256 = SHA256.Create();
var keyHash = sha256.ComputeHash(keyBytes);
var shardNumber = BitConverter.ToUInt32(keyHash, 0) % _configuration.NumberOfShards;

If for whatever reason the partition key isn't specified (as it's optional on the contract) then we can fall back to random assignment.

Describe alternatives you've considered No other alternatives were considered. This would solve the problem at hand, at the cost of some CPU in order to calculate the hash.

Additional context This is important for users using a Dynamo DB outbox with Kafka - if log compaction is enabled for Kafka then it's critical that messages are added to partitions in the order the deposits were originally made to the outbox.

dhickie avatar Apr 25 '25 07:04 dhickie