cascalog
cascalog copied to clipboard
Cascalog should detect duplicate operations and rewrite the query to avoid wasted work
e.g., if c/count is done twice in one query (which can happen especially when predicate macros are involved)
We should also remove operations that are never actually called. This should work:
(defn throw! [& xs] (throw (RuntimeException. "gotcha!")))
(??<- [?x] ([[1]] ?x) (throw! ?x :> ?nothing-meaningful))
;; should => ([1])
;; actually throws error.
I don't agree with that. Sometimes you want to do some sort of side effect in a query.
Hmm, interesting. I can't think of an side-effecting operation that produces unused output variables. On the other hand, if that particular optimization existed, we'd be able to take advantage of
(def query (<- [?x ?y ?z ?a ?b] ....))
(select-fields query ["?x"]) ;; produces an optimized version of query by cutting
;; the ops that produce ?y, ?z, ?a and ?b (and don't affect ?x's production or filtering)
Interesting. I think the right approach then is to have the ability to mark an operation as "has side effects" which will then cause Cascalog to never remove those operations from the query plan.
Also, note that the optimization you showed above is not doable with query rewriting (since the subquery's plan has already been created).
@nathanmarz, the new Cascalog 2.0 planner is lazy and defers all subquery planning until actual execution.