souffle
souffle copied to clipboard
Unnecessary iteration in query plan with records with all fields bound
Check out this program that uses records:
.type Pair = [?x: number, ?y: number]
.decl n(x: number)
n(0).
n(1).
n(2).
.decl r(p: Pair)
r([x, y]) :- n(x), q(y).
.decl q(x: number)
q(x) :-
n(x),
r([x, x]).
.output r
.output q
Souffle 2.2 outputs this query plan with --show=transformed-ram:
DEBUG "q(x) :- \n n(x),\n r([x,x]).\nin file /mate/record.dl [12:1-14:13]"
QUERY
IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
FOR t0 IN n
IF (NOT (t0.0) IN q)
FOR t1 IN @delta_r
UNPACK t2 ARITY 2 FROM t1.0
IF ((t0.0 = t2.0) AND (t0.0 = t2.1))
INSERT (t0.0) INTO @new_q
END QUERY
END DEBUG
This seems sub-optimal - we have bindings for all of the fields of the record [x, x] from n, so we should be able to do a containment check (IN) rather than iterating/scanning (FOR) over @delta_r. I don't know if this is valid RAM syntax, but I'm imagining something like the following:
DEBUG "q(x) :- \n n(x),\n r([x,x]).\nin file /mate/record.dl [12:1-14:13]"
QUERY
IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
FOR t0 IN n
IF (NOT (t0.0) IN q)
IF PACK(t0.0, t0.0) IN @delta_r
INSERT (t0.0) INTO @new_q
END QUERY
END DEBUG
For what it's worth, in my actual codebase the relation containing the records is an eqrel, so I can't really refactor my code to avoid the records. Happy to post more details about that code if it's helpful.
Here's the behavior for a version without records:
.decl n(x: number)
n(0).
n(1).
n(2).
.decl r(x: number, y: number)
r(x, y) :- n(x), q(y).
.decl q(x: number)
q(x) :-
n(x),
r(x, x).
.output r
.output q
DEBUG "q(x) :- \n n(x),\n r(x,x).\nin file /mate/record.dl [10:1-12:11]"
QUERY
IF ((NOT ISEMPTY(n)) AND (NOT ISEMPTY(@delta_r)))
FOR t0 IN n
IF ((t0.0,t0.0) IN @delta_r AND (NOT (t0.0) IN q))
INSERT (t0.0) INTO @new_q
END QUERY
END DEBUG
Ideally, the version with records would emulate this closely, with the only overhead being a few lookups in the record table.
Note that it is possible to get the desired query plan with this modification to the rule for q:
.decl q(x: number)
q(x) :-
n(x),
as(p, Pair) = [x, x],
r(p).
DEBUG "q(x) :- \n n(x),\n as(p, Pair) = [x,x],\n r(p).\nin file /mate/record.dl [12:1-15:8]"
QUERY
IF ((NOT ISEMPTY(n)) AND (NOT ISEMPTY(@delta_r)))
FOR t0 IN n
IF ((PACK(t0.0,t0.0)) IN @delta_r AND (NOT (t0.0) IN q))
INSERT (t0.0) INTO @new_q
END QUERY
END DEBUG
However, this requires an unsafe type cast and knowledge of RAM to figure out, so is still sub-optimal.
Note that it is possible to get the desired query plan with this modification to the rule for q:
I spoke too soon: This workaround isn't viable in the case where n recursively depends on r:
.type Pair = [?x: number, ?y: number]
.decl n(x: number)
n(0).
n(x) :- r([_, x]).
.decl r(p: Pair)
r([x, y]) :- n(x), q(y).
.decl q(x: number)
q(x) :-
n(x),
as(p, Pair) = [x, x],
r(p).
.plan 1: (2, 1)
.output r
.output q
In this case, you get an inefficient scan of n instead of the desired index into it:
DEBUG "q(x) :- \n n(x),\n as(p, Pair) = [x,x],\n r(p). .plan 1:(2,1)\nin file /mate/plan.dl [11:1-14:8]"
QUERY
IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
FOR t0 IN @delta_r
FOR t1 IN n
IF ((t0.0 = PACK(t1.0,t1.0)) AND (NOT (t1.0) IN q))
INSERT (t1.0) INTO @new_q
END QUERY
END DEBUG
Here is the plan I would hope to get (not sure if this is valid RAM syntax):
DEBUG "q(x) :- \n n(x),\n as(p, Pair) = [x,x],\n r(p). .plan 1:(2,1)\nin file /mate/plan.dl [11:1-14:8]"
QUERY
IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
FOR t0 IN @delta_r
IF (UNPACK(t0).1 IN n) AND (NOT (t1.0) IN q))
INSERT (t1.0) INTO @new_q
END QUERY
END DEBUG
Interestingly, ADTs might have the right behavior:
.type Pair = P {?x: number, ?y: number}
.decl n(x: number)
n(0).
n(x) :- r($P(_, x)).
.decl r(p: Pair)
r($P(x, y)) :- n(x), q(y).
.decl q(x: number)
q(x) :-
n(x),
r($P(x,x)).
.plan 1: (2, 1)
.output r
.output q
Gives this plan:
DEBUG "q(x) :- \n n(x),\n r($P(x, x)). .plan 1:(2,1)\nin file /mate/adt.dl [11:1-13:14]"
QUERY
IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
FOR t0 IN @delta_r
UNPACK t1 ARITY 2 FROM t0.0
IF (t1.0 = NUMBER(0))
UNPACK t2 ARITY 2 FROM t1.1
IF (((t2.0 = t2.1) AND (t2.0) IN n) AND (NOT (t2.0) IN q))
INSERT (t2.0) INTO @new_q
END QUERY
END DEBUG
We would need a RAM transformer for optimising record accesses.