cascalog icon indicating copy to clipboard operation
cascalog copied to clipboard

Cascalog should detect duplicate operations and rewrite the query to avoid wasted work

Open sritchie opened this issue 13 years ago • 5 comments

e.g., if c/count is done twice in one query (which can happen especially when predicate macros are involved)

sritchie avatar Dec 06 '11 03:12 sritchie

We should also remove operations that are never actually called. This should work:

(defn throw! [& xs] (throw (RuntimeException. "gotcha!")))

(??<- [?x] ([[1]] ?x) (throw! ?x :> ?nothing-meaningful))
;; should => ([1])
;; actually throws error.

sritchie avatar Dec 12 '11 06:12 sritchie

I don't agree with that. Sometimes you want to do some sort of side effect in a query.

nathanmarz avatar Dec 12 '11 06:12 nathanmarz

Hmm, interesting. I can't think of an side-effecting operation that produces unused output variables. On the other hand, if that particular optimization existed, we'd be able to take advantage of

(def query (<- [?x ?y ?z ?a ?b] ....))
(select-fields query ["?x"]) ;; produces an optimized version of query by cutting
                                       ;; the ops that produce ?y, ?z, ?a and ?b (and don't affect ?x's production or filtering)

sritchie avatar Dec 12 '11 06:12 sritchie

Interesting. I think the right approach then is to have the ability to mark an operation as "has side effects" which will then cause Cascalog to never remove those operations from the query plan.

Also, note that the optimization you showed above is not doable with query rewriting (since the subquery's plan has already been created).

nathanmarz avatar Dec 12 '11 06:12 nathanmarz

@nathanmarz, the new Cascalog 2.0 planner is lazy and defers all subquery planning until actual execution.

sritchie avatar Aug 09 '13 06:08 sritchie