typedb
typedb copied to clipboard
Allow retrieving attributes optionally, without filtering the query
Problem to Solve
In my code objects/entities have many optional attributes and when I try to fetch them I need to fetch all of the attributes from the server (adding them directly to the query will filter the entity if they are missing) and then use only a few which is very inefficient (e.g., A user have many attributes which are optional, but in my code, I have MinimizedUser object that has only a few details that can be shared. Fetching the whole object seems like a waste but it the only way when some of the attributes may not exist).
Current Workaround
@flyingsilverfin suggested
match $x iid 0x123; $x has $a; { $a isa [attr-1]; } or { $a isa [attr-2]; } or ...;
It is a valid solution but it creates huge queries that can only be produced by a machine.
Proposed Solution
I think the ideal solution is to separate by somewhat the MATCH and the GET parts.
e.g.,
match $x isa person, has email "[email protected]";
get $x.username, $x.email, $x.fullname;
where a dot signifies an attribute (this can be extended further but I don't think it is required). This allows me to write short and concise queries which filter only what I want and send back only what I need.
A less optimal solution is to add something like optionallyHas/mayHave keyword, which is none-filterable therefore a query will look something like:
match $x isa person, has email "[email protected]", has username $username, mayHave fullname $fullname;
get $username, $fullname;
We need this as well. At the moment we're matching on the entity e.g., get $x (the person) and then using get_has in Python, which seems unnecessary. My preferred solution is the first one proposed, being able to get optional properties without constraining the query.
Yes, I see this is really valuable. I'll triage this to add the roadmap.
I've also updated the issue title to be a bit clearer
Can this be extended to optional roles?
Consider I may have a relationship where two roles are required, and two are optional. Some data records have four roles, some three, and some two. How then to retrieve all records? e.g.
match $a (required1: $req1, required2: $req2, optional1: $opt1, optional2: $opt2) isa sparse_relation
In the scenario listed above, this match will only retrieve a single record. How can I retrieve all relations, with their data, regardless of how many optional roles are filled?
Can this be extended to optional roles?
Consider I may have a relationship where two roles are required, and two are optional. Some data records have four roles, some three, and some two. How then to retrieve all records? e.g.
match $a (required1: $req1, required2: $req2, optional1: $opt1, optional2: $opt2) isa sparse_relation
In the scenario listed above, this match will only retrieve a single record. How can I retrieve all relations, with their data, regardless of how many optional roles are filled?
I believe by the same principle: uncouple the MATCH and the GET.
It may look something like:
...
match $a (required1: $req1, required2: $req2) isa sparse_relation;
get $a.required1, $a.required2, $a.optional1, $a.optional2;
Where the result will always return required1 and required2 and may return optional1, optional2
It is always possible to take it much further and have something like the following where I used the . (dot) for attribute and -> (arrow) for roles to "step to" and also chained it a bit
match $a (required1: $req1, required2: $req2) isa sparse_relation;
$req1 has username;
$req2 has username;
get $a->required1.username, $a->required2.username, $a->optional1.username, $a->optional2.location_name;
In this conceptual example, you will always get together two usernames and sometimes another username and a location name.
At this stage, this is only a feature request to it can contain various other ideas of other ways to reach the same goal.
High level design plan (implementation date: not decided) -
We introduce a new optional { }
clause (similar to not
or or
) that will possibly populate the answer field ConceptMap.optional
, which is a map in itself.
Example:
match $p isa person, has email "[email protected]"; optional { $p has attr-1 $a1; }; optional { $p has attr-2 $a2; };
=>
{ $p : 0x123, optional { $a1: 0x234, $a2: 0x345}}
{ $p : 0x222, optional {}}
...
You can have multiple optional { }
blocks, each of which can internally contain an arbitrary query again.
Open questions:
- what are the semantics of nested optionals?
- What about optionals in negations or disjunctions?
- To be consistent, variables written in disjunctions that are not currently returned should also return in an
optional
block? Eg.
match $p isa person, has email "@[email protected]"; { $p has name $n; } or { $p has age $a; };
currently returns neither $n
nor $a
because these are not scoped in the top level query. However, in optional
blocks we actually do return the variables not scoped in the top level!
=> this may precipitate a redesign of a ConceptMap
into different components.
We may try to end up being able to have explainable answers at all times -- currently when explain()
is enabled, we ignore the get
filter and return all variables, including anonymous ones. This could for example be part of ConceptMap.Explainable.completeAnswer
, and we can continue respecting the get
filter for the map in the main part of the answer.
Other effects:
Defaults can be implemented in the application layer by checking whether a field is present in the ConceptMap.optional
.
Will this work for a situation like this one?
- An entity with 1000 attributes
- I want to query all the instances of the entity and 100 of those attributes, where some of them have it some don't.
The has?
could achieve that. I also like the get $a->required1.username
(@maydanw not sure why you add the required
part 🤔) but I must admit i don't understand the optional {}
With @maydanw's proposal has
would be ALWAYS a constraint, and the get notation would just grab extra attributes (required or not). They change absolutely nothing in the constraints, so i don't see the point of adding them in the constraints...
I guess adding them in the constraints open the door to use them in other lines, but that would be an extra step... And probably way more complicated to code than just "ok, now that you got the result, just grab the values of these X attributes"
So just another extra idea of a different way by adding an extra type of query 'fetch'
Examples:
example match with optional attributes:
match $e isa employee; #constraints
fetch $e has name $name, has email $email;
example deletion where we want to delete attributes if they are there:
match $e isa employee; #constraints
fetch $e has name $name, has email $email;
delete $e has $name, has $email
This would be another way to treat constraints and fetches as different things
We could also extend optionals to include them in insert or delete blocks:
match
$p ... <conditions...: optional { $p has name $a1; }; optional { $r1 ($p, $q) isa rel-type;}; optional { $new-attr "value" isa attribute; }
delete
optional { $p has $a1; }; optional { $r1 isa relation; }
insert
optional { $p has $new-attr; }
In inserts, the optionals should execute if all required input variables are defined. Match optionals should return a value for each variable if they find matching data.
We could also extend optionals to include them in insert or delete blocks:
match $p ... <conditions>...: optional { $p has name $a1; }; optional { $r1 ($p, $q) isa rel-type;}; optional { $new-attr "value" isa attribute; } delete optional { $p has $a1; }; optional { $r1 isa relation; } insert optional { $p has $new-attr; }
In inserts, the optionals should execute if all required input variables are defined. Match optionals should return a value for each variable if they find matching data.
If I understood correctly, this will allow sending updates without the match query beforehand which can be beneficial yet somewhat dangerous. In such a case the returned value should be very clear what indeed updated.
We could also extend optionals to include them in insert or delete blocks:
An alternative to insert the optionals inside match / delete / insert would be to do it the other way around, which opens a lot of possibilities:
Example 1:
Unlink all players from two optional roles, and replace one of those if it has anything linked to it. As an extra, delete an attribute that is also optional and will not break the stream if it is not there:
match
$r ($roleType:$player) isa process, has id in [1,2];
#unlink in one single query, all players of two optional roles in a relation
optional {
match
$roleType type process:interviews;
delete
$r (interview: $player)
}
optional {
match
$roleType type process:candidate;
delete
$r ( candidate: $player);
}
# Second role is actually replaced (or adding a first one if there is none linked)
insert
$newCandidate isa Person, has id 'Ann';
$r ( candidate: $newCandidate);
# also $r has an optional attribute that will be deleted if it is there
optional {
match
$r has optionalAttribute $att;
delete
$r has $att
}
Example 2:
A second example where optional is used from the root and where variables are shared even between siblings:
#parelel optionals in the root, with shared variables
optional {
match
$p1 isa process has id 1;
insert
$j1 isa job, has id 'j1';
$c isa candidate, has name 'Bob';
$p1 (job: $j1, candidate:$c);
}
optional {
match
$p2 isa process has id 2;
insert
$j2 isa job, has id 'j2';
# As $c is defined in a sibling, it might be empty. No problem, if it is the case, only $j2 is linked.
$p2 (job: $j2, candidate:$c);
}
This would help to create independent queries (so the match clauses of one thread does not break all the inserts/deletions) while keeping a way to use the variables in multiple places
Note: with TypeQL Fetch queries implemented in 2.25.x (https://github.com/vaticle/typedb/pull/6888), many optionality issues will be resolved, although not necessarily all.
The issue with the Fetch model is it destroys the value of the classing system, since every object and variable must be named explicitly so it can then be mapped to json in the appropriate clause.
So if I have 10 variations of a single object, each with approximately 100 lines of typeql import, I require 500-1,000 lines of typeql to explicitly name every single variable, every single sub-object , for every single variation and then provide its json mapping.
Alternatively I can use 10- lines of typeql to match/get the concept using the power of the TypeQL class system. The problem at the moment is that I must know the structure of the object in question up-front in order to understand exactly which optional clauses to include.
Enabling optionals would empower interactive TypeQL which essentially becomes useless with the Fetch->JSON idea, as the clause lengths are far too long to input manually, and are insanely fragile (i.e. one wrong line, or one change, and the whole thing will collapse)
Fetch is only useful with simple objects where the typeql name is the same as the json export name, thus reducing the amount of boilerplate.
Optionals are essential
We decided to test out typedb as a possible replacement of our existing DB and almost immediately ran into this issue within the first hour of playing around.
We first hit this issue when trying a basic query to pull all nodes of a certain type and their attributes (some of which were optional), which we easily worked around using what I assume is the equivalent to SQL's *
operator, $node has attribute $attr;
(if this is not the typical pattern please correct me!)
However, we ran into this issue again while testing out schema changes, and this one did confuse us a bit as the website and docs advertise how seamless they are. We extended a schema with a new field that we need in certain cases, for certain objects of the type. When we try to include this attribute in query results, we only receive objects that have this attribute set. Unlike a typical SQL database which essentially implicitly assigns the attribute with a null or default value to each item, it seems like we would have to do this ourselves in typedb and manually update each object in the database with an empty value for the attribute - or, of course, write multiple queries and join the results, or possibly some more complicated query that can do it all in one shot. The website/docs advertise that schema changes don't require rewriting of queries, which is true - but having to either write multiple queries, one unnecessarily-complex query, or mass manual database writes when writing your new queries that make use of your schema changes seems like the hidden cost of the advertised benefit, which may or may not outweigh it.
Again, we are very new here so please let me know if we're doing anything silly or seem to be misunderstanding something. Either way, I do hope some form of implementation of ideas here is implemented - and even more so if the issue I described above is indeed accurate/sane.
hey @pb376 its hard to make a comment without seeing your schema, but @flyingsilverfin has shown that a fetch statement can be constructed to return optional parts of objects
So the function of returning all of the required data, even if one is not sure what to put in the query., is solved mostly by the Fetch query
@pb376 your expectations of what TypeDB can do based on what's advertised on the website is accurate. However I think you are using the wrong TypeQL query. match
is a "pattern matching" clause and get
retrieves everything you "match". If you want to optionally fetch things related to the pattern you matched, such as select *
from SQL, you should the match-fetch
query.
We will review our docs to make sure this will be clear going forward.
Can you test your typedb application with the fetch query and see if it's what you expected?
Also @pb376 can you tell us which guides you followed in the docs initially?