marten icon indicating copy to clipboard operation
marten copied to clipboard

Async Projection Sharding

Open jeremydmiller opened this issue 3 years ago • 0 comments

Even in the prior async daemon, we executed each individual projection in its own independent track. In V4, let's try to parallelize projection processing within a single projection wherever possible. The V4 async daemon code is already somewhat built around this idea, but there's only one shard per projection so far.

See the AsyncProjectionShards() method below. Today these methods always return a single shard.

    internal interface IProjectionSource
    {
        string ProjectionName { get; }

        IProjection Build(DocumentStore store);

        IReadOnlyList<IAsyncProjectionShard> AsyncProjectionShards(IDocumentStore store, ITenancy tenancy);
    }

Some ideas:

  • The obvious way is to use the tenant id in multi-tenanted situations. Maybe you say there's one shard for a certain tenant or range of tenants, and other shards for different tenants. That would allow us to spin up multiple ProjectionAgent for different logical shards of the same projection
  • For aggregations by the stream, maybe create a synthetic way based on the stream id to assign the entire shard to one shard or another. Could do that randomly when a stream is created, could do it based on the stream id. In this case, I'd vote for us to record the new stream "shard name" in a new column in the streams table to make it easy to query events for one shard at a time.
  • Some kind of after the fact tagging/sharding strategy? Maybe utilize the projection dependency (#1714) to first assign streams or events to a shard, then secondly to build the actual projection.

jeremydmiller avatar Jan 23 '21 19:01 jeremydmiller