materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Optimize Delta joins with sealed inputs

Open ggevay opened this issue 2 years ago • 1 comments

REFRESH NEVER materialized views will seal the Persist shard. As @aalexandrov pointed out, binary joins on such MVs could entirely avoid arranging the other (the non-sealed) input. (Slack discussion.)

Differential joins already partially do something like this: they do form a batch for the initial snapshot on both input, but then they don't maintain an arrangement. This already makes joining with a REFRESH NEVER MV better than with a normal MV for the steady state, but this doesn't improve the situation for the initial snapshot.

However, we could make Delta joins entirely avoid arranging the non-sealed input:

  • In the initial snapshot: If we put the non-sealed input as first input, then the initial snapshot doesn't need an arrangement for the non-sealed input, because only the first input's join path is active for the initial snapshot.
  • For later updates, only the non-sealed side will actually provide updates, so the join path that starts at the sealed side doesn't need to be active, hence we again don't need an arrangement for the non-sealed input.

We should tweak the Delta join code to do the above optimization, and then tweak the optimizer to always plan a Delta join for such binary inputs where one input is a sealed Persist shard.

Relatedly, we could make a similar optimization for Delta joins in one-shot SELECTs: they don't need to arrange the first input.

ggevay avatar Nov 14 '23 13:11 ggevay

Note that we need the concept of unchanging collection, which doesn't change after timestamp s. We are not going to read at [], so we need to be able to tell about a collection that it won't change after timestamp s.

ggevay avatar Apr 29 '24 21:04 ggevay