datascript icon indicating copy to clipboard operation
datascript copied to clipboard

performance problem with a rule

Open freakhill opened this issue 8 years ago • 5 comments

[(chain ?a ?attr1 ?attr2 ?b) [?a ?attr1 ?i0 ?i0 ?attr2 ?b]]

I have a query that can use this rule 4 times but for which I currently manually expands every potential call.

Execution time when used: 0 rules 4 expands: ~hundred ms 1 rule 3 expands: ~hundred ms 2 rules 2 expands: ~90s 3 rules 1 expand ; JVM OOP

Any idea of what could cause that?

(datascript 0.15.0)

freakhill avatar Mar 23 '16 07:03 freakhill

Could you post the expansions as well? (Some sample data would also be helpful, basically something to easily reproduce the problem)

lvh avatar Apr 25 '16 14:04 lvh

I'm having a similar problem with a recursive ancestor rule. The following query never returns, but if I leave off the last anc clause it returns almost instantly (though that does me no good because it cannot be guaranteed to be right). Also , if I'm dealing <= 4 tokens, the query is fast even if complete (i.e. all necesary anc clauses). I don't know why it would suddenly get hosed at 5 tokens since by the time it gets to that last clause, shouldn't it have narrowed down the range of possible bindings for ?p?

(def schema
  {:child      {:db/valueType   :db.type/ref
                :db/cardinality :db.cardinality/many
                :db/isComponent true
                :db/index       true}
   :token/text {:db/index true}})

(def rules
  '[[(anc ?par ?child)
     (?par :child ?child)]
    [(anc ?anc ?child)
     (?par :child ?child)
     (anc ?anc ?par)]])

(d/q '[:find ?p
            :in $ %
            :where
            [?t0 :token/text "central"]
            [?t1 :token/text "square"]
            [?t2 :token/text "of"]
            [?t3 :token/text "Djemma"]
            [?t4 :token/text "El-Fna"]
            [?t0 :token/index ?i0]
            [?t1 :token/index ?i1]
            [?t2 :token/index ?i2]
            [?t3 :token/index ?i3]
            [?t4 :token/index ?i4]
            [(- ?i1 ?i0) ?diff0]
            [(= ?diff0 1)]
            [(- ?i2 ?i1) ?diff1]
            [(= ?diff1 1)]
            [(- ?i3 ?i2) ?diff2]
            [(= ?diff2 1)]
            [(- ?i4 ?i3) ?diff3]
            [(= ?diff3 1)]
            [?p :sentence/index _]
            (anc ?p ?t0)
            (anc ?p ?t1)
            (anc ?p ?t2)
            (anc ?p ?t3)
            (anc ?p ?t4)]
          @conn rules)

Consider the following data:

[{:sentence/index 0
                      :child [{:parse/tag "ROOT"
                               :child [{:parse/tag "S"
                                        :child [{:parse/tag "NP"
                                                 :child [{:token/index 0
                                                          :token/text "central"}
                                                         {:token/index 1
                                                          :token/text "square"}
                                                         {:token/index 2
                                                          :token/text "of"}
                                                         {:token/index 3
                                                          :token/text "Djemma"}
                                                         {:token/index 4
                                                          :token/text "El-Fna"}]}]}]}]}
 {:sentence/index 1
                      :child [{:parse/tag "ROOT"
                               :child [{:parse/tag "S"
                                        :child [{:parse/tag "NP"
                                                 :child [{:token/index 0
                                                          :token/text "central"}
                                                         {:token/index 1
                                                          :token/text "square"}
                                                         {:token/index 2
                                                          :token/text "of"}
                                                         {:token/index 3
                                                          :token/text "Foo"}
                                                         {:token/index 4
                                                          :token/text "El-Fna"}]}]}]}]}]

If you try the query above, it will run with anc clauses through ?t1 but hangs if you add ?t2 or beyond.

This hangs as you might expect:

(d/q '[:find ?t0 ?t1 ?t2
                                  :in $ %
                                  :where
                                  [?p :sentence/index _]
                                  (anc ?p ?t0)
                                  (anc ?p ?t1)
                                  (anc ?p ?t2)]
                                @conn rules)

So that suggests to me that the bindings are not narrowing from the top down and the hang is because those anc clauses are running independently (to be bound after it completes).

neverfox avatar Dec 02 '16 02:12 neverfox

Wait, is this because, like Datomic, variables need binding at invocation time through the use of [ ] in the rule definition?

neverfox avatar Dec 02 '16 02:12 neverfox

Wait, is this because, like Datomic, variables need binding at invocation time through the use of [ ] in the rule definition?

No, DataScript doesn’t support that. Also I believe it’s just validation check in Datomic, not an actual perf optimization

tonsky avatar Dec 02 '16 10:12 tonsky

I see. Thanks. If you do get a chance, I would appreciate your insight on my example. To get around it, I basically had to run the query without

[?p :sentence/index _]
(anc ?p ?t0)
(anc ?p ?t1)
(anc ?p ?t2)
(anc ?p ?t3)
(anc ?p ?t4)

returning tuples of ?t bindings and then loop through the set of tuples using the above as the where clause. It just doesn't like doing it all at once.

neverfox avatar Dec 02 '16 19:12 neverfox