typedb Equivalent count queries have very different execution time. Negation vs disjunction.

Description

We can count the number of people without an email address or a phone number in two ways.

By executing the following query:

match $p isa person; not { $p has phone_number $pn; }; not {$p has email_address $ea;}; count;

Or, by first counting the number of people:

match $p isa person; count;

and then deducting the result of the following query from it:

match $p isa person; { $p has phone_number $pn; } or {$p has email_address $ea;}; count;

The second approach takes about 10% the time of the first. The numbers from each count (before the deduction) are very similar (around 8000 out of a total 17000 people).

I don't know if this points to a serious underlying issue, or if it's just an unusual edge-case.

Environment

OS - Windows
TypeDB version 2.10.0

Reproducible Steps

Load the following schema:

define

person sub entity,
    owns email_address,
    owns phone_number;

person plays provenance:owner;

email_address sub sourced_string, value string;
phone_number sub sourced_string, value string;

sourced_string sub attribute, value string, abstract, plays provenance:object;
provenance sub relation, relates object, relates owner, relates source;

data_source sub entity, owns named_source, owns date_updated, plays provenance:source;
named_source sub attribute, value string;
date_updated sub attribute, value datetime;

Execute the queries according to the description.

Expected Output

Both queries to take the same amount of time.

Jun 01 '22 10:06 thomaschristopherking

Just to be sure - the query in 2. is asking for avatar not person?

Jun 06 '22 13:06 flyingsilverfin

Just to be sure - the query in 2. is asking for avatar not person?

Sorry no that was incorrect, I've fixed it.

Jun 06 '22 13:06 thomaschristopherking

Can we provide the minimal schema to reproduce this issue, @thomaschristopherking ?

Jun 06 '22 20:06 haikalpribadi

Can we provide the minimal schema to reproduce this issue, @thomaschristopherking ?

Sorry just saw this, was on vacation. Added a schema. Might have more than needed to cause this issue (the attribute relations) but it matches what we have.

Jun 20 '22 09:06 thomaschristopherking

@thomaschristopherking do you mind testing this out on 2.12.0 please?

Nov 08 '22 15:11 flyingsilverfin

Reproduced with 2.13.0.

Jan 10 '23 13:01 dmitrii-ubskii

This is likely due to the way the traversal engine handles negations - it finds all answers to the unnegated section (in this case $p isa person; and then executes a traversal of the negated pattern against each answer - leading to $2n$ extra traversals. In contrast, the disjunction only evaluates two conjunctions and avoids the overhead.

We've created #6725 as a more direct issue for the cause.

Jan 10 '23 15:01 krishnangovindraj

Since we have a new issue, I'll close this one as "duplicate", and we track the new one.

Jan 11 '23 12:01 haikalpribadi