datascript icon indicating copy to clipboard operation
datascript copied to clipboard

Simple hash-join optimization

Open huahaiy opened this issue 5 years ago • 1 comments

In hash-join of two relations, a simple optimization is to create the hash table on the smaller side of the relations. This simple change can noticeably improve query performance in some cases.

For instance, if we change q2 of the benchmark from [:find ?e ?a :where [?e :name "Ivan"] [?e :age ?a]] to [:find ?e ?a :where [?e :sex ?a] [?e :name "Ivan"]]. On my machine, before this optimization, the query time is 9.5ms, after the optimization, it is 6.5ms.

Credit: This optimization is one of the techniques introduced in this paper:

Fan et al. 2019, Scaling-up in-memory datalog processing: observations and techniques, Proceedings of the VLDB Endowment, vol. 12. Full text: http://www.vldb.org/pvldb/vol12/p695-fan.pdf

huahaiy avatar Jul 24 '20 18:07 huahaiy

Thanks! Strange, I was sure I am already doing this :) Seems like not.

tonsky avatar Jul 24 '20 19:07 tonsky