declarative-dataflow icon indicating copy to clipboard operation
declarative-dataflow copied to clipboard

Attribute Distribution

Open comnik opened this issue 6 years ago • 0 comments

Some relations (such as Datomic's :db/ident, or relations carrying metadata) are comparatively small but can induce massive skew, especially once reverse indices are involved. This can lead to straggling workers and lots of exchange data.

It might therefore be useful to introduce a new dimension of attribute configuration

Distribution := Sharded | Shuffled | Broadcasted

that would allow us to configure small, critical relations as Broadcasted, while keeping their would-be-skewed join partners entirely local, or shuffled randomly across workers.

The query engine might then need to enforce some new rules for what types of distribution are allowed to go together in a join.

comnik avatar Jul 25 '19 07:07 comnik