typedb
typedb copied to clipboard
TypeDB 3.0 Roadmap
Problem to Solve
We collect the agreed list of changes and requirements that will be in the first version of TypeDB 3.0.
Changes
API
- https://github.com/vaticle/typedb/issues/6772
- https://github.com/vaticle/typedb/issues/7019
Driver
- https://github.com/vaticle/typedb/issues/7020
TypeQL
- https://github.com/vaticle/typedb/issues/7038
- https://github.com/vaticle/typedb/issues/7037
- https://github.com/vaticle/typedb/issues/7021
- https://github.com/vaticle/typedb/issues/7022
- https://github.com/vaticle/typedb/issues/7024
- https://github.com/vaticle/typedb/issues/7028
- https://github.com/vaticle/typedb/issues/7029
- https://github.com/vaticle/typedb/issues/7030
- https://github.com/vaticle/typedb/issues/6765
- https://github.com/vaticle/typedb/issues/6767
Value restriction:
- https://github.com/vaticle/typeql/issues/272
- https://github.com/vaticle/typedb/issues/7023
- https://github.com/vaticle/typeql/issues/271
Require further discussion:
- https://github.com/vaticle/typeql/issues/301
- https://github.com/vaticle/typedb/issues/6896
- https://github.com/vaticle/typedb/issues/6175
- https://github.com/vaticle/typedb/issues/5527
- https://github.com/vaticle/typedb/issues/6175
Relation implementation
- https://github.com/vaticle/typedb/issues/6769
- https://github.com/vaticle/typedb/issues/6771
Changes proposed and rejected:
- Immutable relations
- https://github.com/vaticle/typedb/issues/6770
Let's make sure each of them is documented properly in an issue, @flyingsilverfin
Yes @haikalpribadi that's what the colons are for :D I have to get to that next
Internal changes
A ?
indicates not yet fully discussed.
TypeQL
- [ ] Fix backslash escaping
? - [ ] Allow modifiers on
match
inside of a delete/insert query to allow flexible query operations such as batching ? - [ ] RenameMatchQuery
toGetQuery
. To discuss: how this plays with the above
RPC
- [ ] Replace the session ID with a long, instead of an inefficient vector
Pattern & Resolvables
- [ ] Implement the query representation as a set of constraints that own variables, instead of the other way around
- [ ] Remove the idea of
Resolvables
(eg. Concludables, etc.) and merge them intoPatterns
** Concepts **
- [ ] The schema concept layer should more aggressively cache, in more CPU friendly format, various shortcuts such as as owned attribute types directly, without having to traverse through the super types as well. Additionally, we should store all schema-level data in flat sorted arrays (likely never exceeds about 100mb in size with the largest possible schemas) to optimise access.
Traversal & Reasoner
- [ ] Rearchitect reasoner to manage its own memory
- [ ] Push
Concept
to be the bottom layer of the database that traversals and reasoner operate over. Below that can still exist a graph and storage layer, but they should not be exposed - [ ] Convert
explain
into aexplain()
query that takes a query and bounds. Alternatively, we could just explain the existence of an inferred concept without a query? Also, convert explanations into something more native? - [ ] Handle negations in the traversal natively
Thank you guys for working on this and sharing it with us. The two key features that extremely limit our use cases and I'm missing in this list:
Optionals/fetch
https://github.com/vaticle/typedb/issues/6322 Including a way to have optional played roles also, not only attributes.
Vectors and ordered lists
I see they've been discarded :S, but there is no simple workaround for this: https://github.com/vaticle/typedb/issues/6327
Vectors
We don't need them as a particular attribute types, maybe a @sortable
or @indexable
when defining relations would work.
Storing something like this in typeDB is really hard, and mutate it (add items at particular positions) in a performant way is near to impossible
Ordered lists
This also includes ordered lists with repeated values which are really hard to store in typeDB.
Ex: [1,2,2,3,7,2]
or ['blue', 'green', 'green', 'red']
I think the proposal is only to remove repeated roles & players
role1: $player1, role1: $player1
While this would keep working:
role2: $player1, role2: $player2
So it's not about removing the cardinality MANY of roles, but removing the possibility of a player to play the same role multiple times.
So basically, no repetition. Which I agree is not the most common use case out there. But it does happen.
Btw a workaround for this in the new format would be to create an intermediary entity "event" for instance so instead of A<>B & A<>B we would have to do A<>EV1<>B & A<>EV2<>B
Hi All,
After speaking to Haikal, there are good reasons to move from the Concept API to Fetch, particularly speed. At present it takes between 2-5 secs to retrieve an object from TypeDB and transpile it to valid Stix JSON. This is mainly due to all of the network roundtrips that have to be done, so clearly one fetch query will be more effective.
The advantage of our current system is it is shape-based, so I can handle all JSON objects using the same ORM, the disadvantage is speed.
The new approach does mean a lot more code, since we have to build quite long Fetch statements for each individual object (e.g. 16-44) lines for each of our 85 objects, and then build the transpile code (from returned Fetch JSON to Stix JSON). This figure assumes a single main object, 4 lines, and then 3-11 optional sub objects with relations, with 4 lines each, if we use the class hierarchy. But the benefit will be far greater speed, totally agreed.
We probably wont be able to make this move for some months, due to resourcing, but we agree it will be worth it. At the same time we can update our 2.500 lines of schema code to v3. This will place us in good position to add on another 50-80 cybersecurity objects (e.g. SBOM's, Vulnerabilities, Risk etc.)
Onwards and upwards for TypeDB and our cybersecurity application!!
I hope we will get the same tree structure for mutations. Batch mutations and optional mutations are currently a nightmare, while queries with fetch are so smooth.
A point of enhancement could be to be able to use multiple match fetch in the same query, and same for the mutations, instead of having a single entry point.
This is possible in the nested branches, we can open multiple ones and asign them to different keys, but it is not possible to have multiple keys at the root level.
Another key conceptual blocking point in mutations for us is how cardinality MANY is handled. Whenever the match clauses start doing permutations, the insert / delete are run as in a FOR loop.
This issue has an example of one insertion that is run N times against intuition: https://github.com/vaticle/typedb/issues/6902
In 3.0 I would love to see $vars being aware of their cardinality. The way that dgraph executes this type of mutations is really intuitive, each variable holds and array of iids, so if a match does something like this
match
$jobPosition isa jobPosition has id 'frontendDeveloper';
$candidate isa Person, has name 'Junior Peter':
$allInterviewers isa Person, has departMentName 'IT':
insert
$selectionProcess ( candidate: $candidate, job: $jobPosition, interviewers: $allInterviewers) has id 'selectionProces1':
This would be run a single time and create a sinfle selectionProcess as expected.
Alternatives: a) in order to enable FOR loops as they happen now, new syntaxis for loops could be created, which are more rare cases.
b) Another alternative would be to indicate the type of cardinality in the roles when defining the schema, so we now which things are treated as arrays and store multiple iiids in the $var, and which things follow current behaviour
c) Yet another alternative could be to clearly define array variables, for instance doing []allInterviewers isa ....
instead of $allInterviewers isa ....
v3.0 is looking awesome, but can you also detail TIME and GPS please
V3.0 a pretty massive rewrite, and in fact we may probably reengineer the schema, since originally Tomas adopted the Vaticle style guide, and made all of the property names different from the TypeDB ones. The consequence of this is that Fetch statements must be super long to include every property, every sub-objects and all of its properties. If the variable names are the same then Fetch would be more powerful and concise.
Still, the powerful new capabilities of v3.0 make it worth this re-engineering, as long as TIME and GPS are sorted. Please provide architectural best practice for these two, thanks
So after lot of thought Im changing my wishlist priorities. My key needed feature is being able to share $vars between different streams. This would fix almost every issue we are facing with mutations and is something enabled in most databases.
As an example:
startTX
insert
$b isa Book, has id 1
---
match
$allAuthors isa Author
---
insert
$authorship ($book, $allAuthors) isa Authorship
endTx
Can LLM Vectors be stored and indexed?
This would be very useful, to store LLM vectors along with entities or relations. Can it be done using structs or lists somehow? LLM are going to keep getting bigger, so it'll have to be addressed at some stage. Need to connect TypeDB to natural language meaning, which is a vector in the case of LLM's.
Can LLM Vectors be stored and indexed?
This would be very useful, to store LLM vectors along with entities or relations. Can it be done using structs or lists somehow? LLM are going to keep getting bigger, so it'll have to be addressed at some stage. Need to connect TypeDB to natural language meaning, which is a vector in the case of LLM's.
It's not just vector storage, but the Approximate Nearest Neighbour search that is also required.