dyna icon indicating copy to clipboard operation
dyna copied to clipboard

use overlapping variables to help choose among plans?

Open jeisner opened this issue 11 years ago • 6 comments

In the max-product algorithm, I wrote this (where L is a label on the edge):

ecount(L,U,V) += alpha(U)*edge(L,U,V)*beta(V).

It seemed to be extremely slow (I had to abort it). I am guessing that the planner picked a suboptimal order alpha(U), beta(V), edge(L,U,V). When I forced a smarter binarization it worked fine:

alpha(L,U,V) += alpha(U)*edge(L,U,V).  % temp variable
ecount(L,U,V) += alpha(L,U,V)*beta(V).

I know we don't really have plan costs yet, but could we do any better here using simple heuristics? The good plan here only starts with one completely free subquery alpha(U), whereas the bad plan starts with two completely free subqueries alpha(U),beta(V).

jeisner avatar Jul 17 '13 19:07 jeisner

I can confirm that the planner picked alpha(U), beta(V), edge(L,U,V) as the order.

Without a cardinality estimate there is no reason to believe that edge/3 is any smaller than alpha \cross beta... or even related to alpha and beta (static analysis can probably tell us that)... I suppose this is even the case with a binary relation edge(U,V) -- with out the L.

timvieira avatar Jul 17 '13 21:07 timvieira

Yes, I know there are no guarantees and we really want cardinality. I was suggesting a heuristic that might tend to work in the meantime (i.e., better than nothing) so that the students can do the assignment. Maybe you have compelling counterexamples? But my thinking was that when there are overlapping variables, they are "intended" to be used to restrict the cardinality, and furthermore that there are diminishing returns (so grounding a second variable as in the current planner's plan is not as helpful as grounding the first one).

Anything you know about det and multi should override this. If binding a variable drives the mode from multi to det, you already know that helped. And if the mode was already det, then binding the variable can't really improve it. My suggestion was only that going from multi to multi should be assumed to involve some kind of cardinality reduction.

Without a cardinality estimate there is no reason to believe that edge/3 is any smaller than alpha \cross beta... or even related to alpha and beta (static analysis can probably tell us that)...

The idea of using static analysis to get subsets is intriguing. It's probably not going to work here, though, because the U positions of the different subqueries are not in a subset relation. The terms that can show up in U position for alpha are a subset of things that can show up in V position for edge (as well as the start vertex). Thus, there may be reachable states U with no out-edges, and unreachable states that do have out-eges.

jeisner avatar Jul 18 '13 02:07 jeisner

The planner is picking alpha(U), beta(V), edge(L,U,V) as each such query binds two free variables (including the result of is). The alternative, alpha(U), edge(L,U,V), beta(V), requires running out is edge(out,in,out), which we've told it is much worse than can be made up for the fact that the next query is out is beta(in). I don't understand (sorry) the alternative heuristic you are proposing?

nwf avatar Jul 18 '13 05:07 nwf

Let me try to understand your current heuristic first. Why "much worse"? In both cases, you have nested loops over 3 variables, just in different orders (U,V,L versus U,[L,V]) and with different tests.

jeisner avatar Jul 18 '13 06:07 jeisner

The current heuristic (exponentially) prefers steps that bind fewer variables to ones that bind more. Thus the binding of three variables in out is edge(out,in,out) is much worse than the binding of two in out is edge(out,in,in) as far as it is concerned. This is countered by a preference for plans with fewer loops, but apparently not enough to help here, or we are not tracking determinism correctly (it would not shock me), or both.

nwf avatar Jul 18 '13 07:07 nwf

Yes, but if you bind 3 variables rather than 2 in the edge goal, then this pays off in the next step whenyou get to bind 0 rather than 1 in the beta goal. So how do you trade these things off against each other? Is it greedy? Or do you mean that the costs are something like (exp 3 + exp 0) for the first plan which is worse than (exp 2 + exp 1) for the second plan? If you were aggregating as exp 3 * exp 0 versus exp 2 * exp 1, that would be a wash, i.e., they both have 3 nested loops

jeisner avatar Jul 19 '13 03:07 jeisner