souffle Unnecessary iteration in query plan with records with all fields bound

Unnecessary iteration in query plan with records with all fields bound

Open langston-barrett opened this issue 2 years ago • 5 comments

Check out this program that uses records:

.type Pair = [?x: number, ?y: number]

.decl n(x: number)
n(0).
n(1).
n(2).

.decl r(p: Pair)
r([x, y]) :- n(x), q(y).

.decl q(x: number)
q(x) :-
  n(x),
  r([x, x]).

.output r
.output q

Souffle 2.2 outputs this query plan with --show=transformed-ram:

   DEBUG "q(x) :- \n   n(x),\n   r([x,x]).\nin file /mate/record.dl [12:1-14:13]"
    QUERY
     IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
      FOR t0 IN n
       IF (NOT (t0.0) IN q)
        FOR t1 IN @delta_r
         UNPACK t2 ARITY 2 FROM t1.0
          IF ((t0.0 = t2.0) AND (t0.0 = t2.1))
           INSERT (t0.0) INTO @new_q
    END QUERY
   END DEBUG

This seems sub-optimal - we have bindings for all of the fields of the record [x, x] from n, so we should be able to do a containment check (IN) rather than iterating/scanning (FOR) over @delta_r. I don't know if this is valid RAM syntax, but I'm imagining something like the following:

   DEBUG "q(x) :- \n   n(x),\n   r([x,x]).\nin file /mate/record.dl [12:1-14:13]"
    QUERY
     IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
      FOR t0 IN n
       IF (NOT (t0.0) IN q)
        IF PACK(t0.0, t0.0) IN @delta_r
          INSERT (t0.0) INTO @new_q
    END QUERY
   END DEBUG

For what it's worth, in my actual codebase the relation containing the records is an eqrel, so I can't really refactor my code to avoid the records. Happy to post more details about that code if it's helpful.

Mar 11 '22 15:03 langston-barrett

Here's the behavior for a version without records:

.decl n(x: number)
n(0).
n(1).
n(2).

.decl r(x: number, y: number)
r(x, y) :- n(x), q(y).

.decl q(x: number)
q(x) :-
  n(x),
  r(x, x).

.output r
.output q

   DEBUG "q(x) :- \n   n(x),\n   r(x,x).\nin file /mate/record.dl [10:1-12:11]"
    QUERY
     IF ((NOT ISEMPTY(n)) AND (NOT ISEMPTY(@delta_r)))
      FOR t0 IN n
       IF ((t0.0,t0.0) IN @delta_r AND (NOT (t0.0) IN q))
        INSERT (t0.0) INTO @new_q
    END QUERY
   END DEBUG

Ideally, the version with records would emulate this closely, with the only overhead being a few lookups in the record table.

Mar 11 '22 15:03 langston-barrett

Note that it is possible to get the desired query plan with this modification to the rule for q:

.decl q(x: number)
q(x) :-
  n(x),
  as(p, Pair) = [x, x],
  r(p).

   DEBUG "q(x) :- \n   n(x),\n   as(p, Pair) = [x,x],\n   r(p).\nin file /mate/record.dl [12:1-15:8]"
    QUERY
     IF ((NOT ISEMPTY(n)) AND (NOT ISEMPTY(@delta_r)))
      FOR t0 IN n
       IF ((PACK(t0.0,t0.0)) IN @delta_r AND (NOT (t0.0) IN q))
        INSERT (t0.0) INTO @new_q
    END QUERY
   END DEBUG

However, this requires an unsafe type cast and knowledge of RAM to figure out, so is still sub-optimal.

Mar 11 '22 17:03 langston-barrett

Note that it is possible to get the desired query plan with this modification to the rule for q:

I spoke too soon: This workaround isn't viable in the case where n recursively depends on r:

.type Pair = [?x: number, ?y: number]

.decl n(x: number)
n(0).
n(x) :- r([_, x]).

.decl r(p: Pair)
r([x, y]) :- n(x), q(y).

.decl q(x: number)
q(x) :-
  n(x),
  as(p, Pair) = [x, x],
  r(p).
.plan 1: (2, 1)

.output r
.output q

In this case, you get an inefficient scan of n instead of the desired index into it:

   DEBUG "q(x) :- \n   n(x),\n   as(p, Pair) = [x,x],\n   r(p). .plan 1:(2,1)\nin file /mate/plan.dl [11:1-14:8]"
    QUERY
     IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
      FOR t0 IN @delta_r
       FOR t1 IN n
        IF ((t0.0 = PACK(t1.0,t1.0)) AND (NOT (t1.0) IN q))
         INSERT (t1.0) INTO @new_q
    END QUERY
   END DEBUG

Here is the plan I would hope to get (not sure if this is valid RAM syntax):

   DEBUG "q(x) :- \n   n(x),\n   as(p, Pair) = [x,x],\n   r(p). .plan 1:(2,1)\nin file /mate/plan.dl [11:1-14:8]"
    QUERY
     IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
      FOR t0 IN @delta_r
        IF (UNPACK(t0).1 IN n) AND (NOT (t1.0) IN q))
         INSERT (t1.0) INTO @new_q
    END QUERY
   END DEBUG

Mar 16 '22 19:03 langston-barrett

Interestingly, ADTs might have the right behavior:

.type Pair = P {?x: number, ?y: number}

.decl n(x: number)
n(0).
n(x) :- r($P(_, x)).

.decl r(p: Pair)
r($P(x, y)) :- n(x), q(y).

.decl q(x: number)
q(x) :-
  n(x),
  r($P(x,x)).
.plan 1: (2, 1)

.output r
.output q

Gives this plan:

   DEBUG "q(x) :- \n   n(x),\n   r($P(x, x)). .plan 1:(2,1)\nin file /mate/adt.dl [11:1-13:14]"
    QUERY
     IF ((NOT ISEMPTY(@delta_r)) AND (NOT ISEMPTY(n)))
      FOR t0 IN @delta_r
       UNPACK t1 ARITY 2 FROM t0.0
        IF (t1.0 = NUMBER(0))
         UNPACK t2 ARITY 2 FROM t1.1
          IF (((t2.0 = t2.1) AND (t2.0) IN n) AND (NOT (t2.0) IN q))
           INSERT (t2.0) INTO @new_q
    END QUERY
   END DEBUG

Mar 16 '22 22:03 thinkmoore

We would need a RAM transformer for optimising record accesses.

Mar 17 '22 00:03 b-scholz

souffle souffle copied to clipboard

Unnecessary iteration in query plan with records with all fields bound

souffle
souffle copied to clipboard