typedb
typedb copied to clipboard
Equivalent count queries have very different execution time. Negation vs disjunction.
Description
We can count the number of people without an email address or a phone number in two ways.
- By executing the following query:
match $p isa person; not { $p has phone_number $pn; }; not {$p has email_address $ea;}; count;
- Or, by first counting the number of people:
match $p isa person; count;
and then deducting the result of the following query from it:
match $p isa person; { $p has phone_number $pn; } or {$p has email_address $ea;}; count;
The second approach takes about 10% the time of the first. The numbers from each count (before the deduction) are very similar (around 8000 out of a total 17000 people).
I don't know if this points to a serious underlying issue, or if it's just an unusual edge-case.
Environment
- OS - Windows
- TypeDB version 2.10.0
Reproducible Steps
Load the following schema:
define
person sub entity,
owns email_address,
owns phone_number;
person plays provenance:owner;
email_address sub sourced_string, value string;
phone_number sub sourced_string, value string;
sourced_string sub attribute, value string, abstract, plays provenance:object;
provenance sub relation, relates object, relates owner, relates source;
data_source sub entity, owns named_source, owns date_updated, plays provenance:source;
named_source sub attribute, value string;
date_updated sub attribute, value datetime;
Execute the queries according to the description.
Expected Output
Both queries to take the same amount of time.
Just to be sure - the query in 2. is asking for avatar not person?
Just to be sure - the query in
2.is asking foravatarnotperson?
Sorry no that was incorrect, I've fixed it.
Can we provide the minimal schema to reproduce this issue, @thomaschristopherking ?
Can we provide the minimal schema to reproduce this issue, @thomaschristopherking ?
Sorry just saw this, was on vacation. Added a schema. Might have more than needed to cause this issue (the attribute relations) but it matches what we have.
@thomaschristopherking do you mind testing this out on 2.12.0 please?
Reproduced with 2.13.0.
This is likely due to the way the traversal engine handles negations - it finds all answers to the unnegated section (in this case $p isa person; and then executes a traversal of the negated pattern against each answer - leading to $2n$ extra traversals.
In contrast, the disjunction only evaluates two conjunctions and avoids the overhead.
We've created #6725 as a more direct issue for the cause.
Since we have a new issue, I'll close this one as "duplicate", and we track the new one.