typedb icon indicating copy to clipboard operation
typedb copied to clipboard

TypeDB 3.0 Roadmap

Open flyingsilverfin opened this issue 1 year ago • 13 comments

Problem to Solve

We collect the agreed list of changes and requirements that will be in the first version of TypeDB 3.0.

Changes

API

  • https://github.com/vaticle/typedb/issues/6772
  • https://github.com/vaticle/typedb/issues/7019

Driver

  • https://github.com/vaticle/typedb/issues/7020

TypeQL

  • https://github.com/vaticle/typedb/issues/7038
  • https://github.com/vaticle/typedb/issues/7037
  • https://github.com/vaticle/typedb/issues/7021
  • https://github.com/vaticle/typedb/issues/7022
  • https://github.com/vaticle/typedb/issues/7024
  • https://github.com/vaticle/typedb/issues/7028
  • https://github.com/vaticle/typedb/issues/7029
  • https://github.com/vaticle/typedb/issues/7030
  • https://github.com/vaticle/typedb/issues/6765
  • https://github.com/vaticle/typedb/issues/6767

Value restriction:

  • https://github.com/vaticle/typeql/issues/272
  • https://github.com/vaticle/typedb/issues/7023
  • https://github.com/vaticle/typeql/issues/271

Require further discussion:

  • https://github.com/vaticle/typeql/issues/301
  • https://github.com/vaticle/typedb/issues/6896
  • https://github.com/vaticle/typedb/issues/6175
  • https://github.com/vaticle/typedb/issues/5527
  • https://github.com/vaticle/typedb/issues/6175

Relation implementation

  • https://github.com/vaticle/typedb/issues/6769
  • https://github.com/vaticle/typedb/issues/6771

Changes proposed and rejected:

  • Immutable relations
  • https://github.com/vaticle/typedb/issues/6770

flyingsilverfin avatar Mar 16 '23 17:03 flyingsilverfin

Let's make sure each of them is documented properly in an issue, @flyingsilverfin

haikalpribadi avatar Mar 16 '23 18:03 haikalpribadi

Yes @haikalpribadi that's what the colons are for :D I have to get to that next

flyingsilverfin avatar Mar 17 '23 08:03 flyingsilverfin

Internal changes

A ? indicates not yet fully discussed.

TypeQL

  • [ ] Fix backslash escaping ? - [ ] Allow modifiers on match inside of a delete/insert query to allow flexible query operations such as batching ? - [ ] Rename MatchQuery to GetQuery. To discuss: how this plays with the above

RPC

  • [ ] Replace the session ID with a long, instead of an inefficient vector

Pattern & Resolvables

  • [ ] Implement the query representation as a set of constraints that own variables, instead of the other way around
  • [ ] Remove the idea of Resolvables (eg. Concludables, etc.) and merge them into Patterns

** Concepts **

  • [ ] The schema concept layer should more aggressively cache, in more CPU friendly format, various shortcuts such as as owned attribute types directly, without having to traverse through the super types as well. Additionally, we should store all schema-level data in flat sorted arrays (likely never exceeds about 100mb in size with the largest possible schemas) to optimise access.

Traversal & Reasoner

  • [ ] Rearchitect reasoner to manage its own memory
  • [ ] Push Concept to be the bottom layer of the database that traversals and reasoner operate over. Below that can still exist a graph and storage layer, but they should not be exposed
  • [ ] Convert explain into a explain() query that takes a query and bounds. Alternatively, we could just explain the existence of an inferred concept without a query? Also, convert explanations into something more native?
  • [ ] Handle negations in the traversal natively

flyingsilverfin avatar Mar 17 '23 12:03 flyingsilverfin

Thank you guys for working on this and sharing it with us. The two key features that extremely limit our use cases and I'm missing in this list:

Optionals/fetch

https://github.com/vaticle/typedb/issues/6322 Including a way to have optional played roles also, not only attributes.

Vectors and ordered lists

I see they've been discarded :S, but there is no simple workaround for this: https://github.com/vaticle/typedb/issues/6327

Vectors

We don't need them as a particular attribute types, maybe a @sortable or @indexable when defining relations would work.

Storing something like this in typeDB is really hard, and mutate it (add items at particular positions) in a performant way is near to impossible image

Ordered lists

This also includes ordered lists with repeated values which are really hard to store in typeDB. Ex: [1,2,2,3,7,2] or ['blue', 'green', 'green', 'red']

lveillard avatar Mar 17 '23 22:03 lveillard

I think the proposal is only to remove repeated roles & players

role1: $player1, role1: $player1

While this would keep working:

role2: $player1, role2: $player2

So it's not about removing the cardinality MANY of roles, but removing the possibility of a player to play the same role multiple times.

So basically, no repetition. Which I agree is not the most common use case out there. But it does happen.

Btw a workaround for this in the new format would be to create an intermediary entity "event" for instance so instead of A<>B & A<>B we would have to do A<>EV1<>B & A<>EV2<>B

lveillard avatar Nov 30 '23 14:11 lveillard

Hi All,

After speaking to Haikal, there are good reasons to move from the Concept API to Fetch, particularly speed. At present it takes between 2-5 secs to retrieve an object from TypeDB and transpile it to valid Stix JSON. This is mainly due to all of the network roundtrips that have to be done, so clearly one fetch query will be more effective.

The advantage of our current system is it is shape-based, so I can handle all JSON objects using the same ORM, the disadvantage is speed.

The new approach does mean a lot more code, since we have to build quite long Fetch statements for each individual object (e.g. 16-44) lines for each of our 85 objects, and then build the transpile code (from returned Fetch JSON to Stix JSON). This figure assumes a single main object, 4 lines, and then 3-11 optional sub objects with relations, with 4 lines each, if we use the class hierarchy. But the benefit will be far greater speed, totally agreed.

We probably wont be able to make this move for some months, due to resourcing, but we agree it will be worth it. At the same time we can update our 2.500 lines of schema code to v3. This will place us in good position to add on another 50-80 cybersecurity objects (e.g. SBOM's, Vulnerabilities, Risk etc.)

Onwards and upwards for TypeDB and our cybersecurity application!!

brettforbes avatar Apr 30 '24 09:04 brettforbes

I hope we will get the same tree structure for mutations. Batch mutations and optional mutations are currently a nightmare, while queries with fetch are so smooth.

A point of enhancement could be to be able to use multiple match fetch in the same query, and same for the mutations, instead of having a single entry point.

This is possible in the nested branches, we can open multiple ones and asign them to different keys, but it is not possible to have multiple keys at the root level.

lveillard avatar May 02 '24 18:05 lveillard

Another key conceptual blocking point in mutations for us is how cardinality MANY is handled. Whenever the match clauses start doing permutations, the insert / delete are run as in a FOR loop.

This issue has an example of one insertion that is run N times against intuition: https://github.com/vaticle/typedb/issues/6902

In 3.0 I would love to see $vars being aware of their cardinality. The way that dgraph executes this type of mutations is really intuitive, each variable holds and array of iids, so if a match does something like this

match
$jobPosition isa jobPosition has id 'frontendDeveloper';
$candidate isa Person, has name 'Junior Peter':
$allInterviewers isa Person, has departMentName 'IT':

insert
$selectionProcess ( candidate: $candidate, job: $jobPosition, interviewers: $allInterviewers) has id 'selectionProces1':

This would be run a single time and create a sinfle selectionProcess as expected.

Alternatives: a) in order to enable FOR loops as they happen now, new syntaxis for loops could be created, which are more rare cases.

b) Another alternative would be to indicate the type of cardinality in the roles when defining the schema, so we now which things are treated as arrays and store multiple iiids in the $var, and which things follow current behaviour

c) Yet another alternative could be to clearly define array variables, for instance doing []allInterviewers isa .... instead of $allInterviewers isa ....

lveillard avatar May 02 '24 18:05 lveillard

v3.0 is looking awesome, but can you also detail TIME and GPS please

V3.0 a pretty massive rewrite, and in fact we may probably reengineer the schema, since originally Tomas adopted the Vaticle style guide, and made all of the property names different from the TypeDB ones. The consequence of this is that Fetch statements must be super long to include every property, every sub-objects and all of its properties. If the variable names are the same then Fetch would be more powerful and concise.

Still, the powerful new capabilities of v3.0 make it worth this re-engineering, as long as TIME and GPS are sorted. Please provide architectural best practice for these two, thanks

brettforbes avatar May 07 '24 09:05 brettforbes

So after lot of thought Im changing my wishlist priorities. My key needed feature is being able to share $vars between different streams. This would fix almost every issue we are facing with mutations and is something enabled in most databases.

As an example:

startTX

   insert
      $b isa Book, has id 1
    ---
    match
       $allAuthors isa Author
    ---
    insert
      $authorship ($book, $allAuthors) isa Authorship
 
endTx

lveillard avatar May 24 '24 13:05 lveillard

Can LLM Vectors be stored and indexed?

This would be very useful, to store LLM vectors along with entities or relations. Can it be done using structs or lists somehow? LLM are going to keep getting bigger, so it'll have to be addressed at some stage. Need to connect TypeDB to natural language meaning, which is a vector in the case of LLM's.

brettforbes avatar May 28 '24 05:05 brettforbes

Can LLM Vectors be stored and indexed?

This would be very useful, to store LLM vectors along with entities or relations. Can it be done using structs or lists somehow? LLM are going to keep getting bigger, so it'll have to be addressed at some stage. Need to connect TypeDB to natural language meaning, which is a vector in the case of LLM's.

It's not just vector storage, but the Approximate Nearest Neighbour search that is also required.

sjpritchard avatar Jun 08 '24 04:06 sjpritchard