differential-datalog icon indicating copy to clipboard operation
differential-datalog copied to clipboard

Support left joins

Open ryzhyk opened this issue 6 years ago • 11 comments

Left joins are currently implemented using antijoins. Differential dataflow provides a better way.

ryzhyk avatar Dec 12 '18 23:12 ryzhyk

Any progress here? cc @frankmcsherry

djahandarie avatar Apr 20 '20 20:04 djahandarie

Nope.

ryzhyk avatar Apr 20 '20 20:04 ryzhyk

I'm not sure I have all the context on the question, but you should always be able to implement a left join using inner joins and potentially either antijoins or reductions. If you can identify keys in one relation without matching keys in another relation (non-monotonic, unfortunately) then you can perform the inner join, and pad non-matching keys with the default values.

But, as a non-monotonic operation it is not natively implementable in Datalog, I believe (edit: without stratification).

frankmcsherry avatar Apr 20 '20 20:04 frankmcsherry

DDlog does support stratified negation, so we can express left joins, but this issue was about implementing them more efficiently when possible (e.g., when a left join is followed by aggregation). This would require either implementing some form of static analysis to identify the pattern or exposing a new language primitive.

ryzhyk avatar Apr 20 '20 20:04 ryzhyk

I vote for the static analysis. No point in making the language more complex than needed.

mihaibudiu avatar Apr 20 '20 20:04 mihaibudiu

At Materialize we do this with a pretty simple static analysis that goes down from reductions (or filters sometimes) and has the form "records with NULL values in any of these fields can be dropped without affecting the results". When this gets to the union for a left join it descends both branches and .. has no effect on the inner join, but is able to zero out the half that produces NULL values for missing keys.

This would mean that you would need to have a notion of NULL (or a more sane default value; perhaps zero) and I wouldn't wish that on anyone.

frankmcsherry avatar Apr 20 '20 21:04 frankmcsherry

That makes sense, thanks! Part of the problem is that DDlog does not have an explicit left join operation, so we would have to either introduce one or have an even smarter analysis that deduces that a fragment of the program behaves as a left join. I prefer the former.

ryzhyk avatar Apr 20 '20 21:04 ryzhyk

The SQL to DDlog compiler can recognize left joins more easily. And we have a clear convention for representing NULLs too.

mihaibudiu avatar Apr 20 '20 21:04 mihaibudiu

But that's a different problem.

mihaibudiu avatar Apr 20 '20 21:04 mihaibudiu

Just to confirm, right now we have to make two intermediate relations with antijoin and inner join and then combine them into our final relation (or use them directly)? Or is there a more terse syntax that is possible in the meantime.

norcalli avatar Feb 18 '21 00:02 norcalli

That's correct, you need two rules, there is no shortcut to do this more concisely.

ryzhyk avatar Feb 18 '21 00:02 ryzhyk