nodestream icon indicating copy to clipboard operation
nodestream copied to clipboard

Support Temporal Versioning

Open zprobst opened this issue 10 months ago • 0 comments

Is your feature request related to a problem? Please describe.

Versioning allows us to keep track of change by querying over a range of time and assess the database's current state at various points in time. This allows for useful operations such as what-if analysis.

Describe the solution you'd like

Ideally, nodestream can introduce a change that allows for versioning of nodes and relationships to happen. From a developer perspective, you could do something like this:

  • Create a pipeline with nodes and relationships like normal. For example, a Pipeline that connects User and Post nodes through a LIKES relationship.
  • Run the pipeline like normal (nodestream run likes-pipeline --target my-db)
  • Run a CLI command to generate versioning configuration as a new migration for the node and relationship types. Note that you do not need to version the entire schema: nodestream version User Post LIKES
  • Run the migrations nodestream migrations run --target my-db
  • Run the pipeline like normal, nodestream run likes-pipeline --target my-db, and get a versioned schema. (Defined below)

Relationship Versioning Nodestream Temporal Versioning Schema (1)

Relationship versioning can be accomplished by changing the relationship type of from RELATED_TO to OLD_RELATED_TO. Each relationship established can be assigned with a from property that represents the point in time that the relationship was established. When a relationship is expired, we can add the OLD_ prefix and set a to timestamp that established the point the relationship was desolved. NOTE this varies from other versioning schemes by establishing the OLD_ prefix, which allows "current state" queries to remain unchanged.

Node (Property) Versioning Node versioning can be obtained easily on top of relationship versioning. Each state change is represented as a new version node where the current state is both on the actual entity node as well as the latest version. This allows you to match an entity and then retrieve its current state at that time. Similarly, nodes that are expired can be via the Old prefix such as OldNodeType.

Nodestream Temporal Versioning Schema

Describe alternatives you've considered

There are several other possible alternatives to the temporal versioning itself.

  • Don't do temporal versioning at all.
  • Save snapshots of the graph at an interval. This does not allow for several possible cases including more granular look backs, comparing from a point in time through now, and more. If you need granular, you could effectively event source nodestream and start at some snapshot and then play events until a relevant point in time.
  • Save the data that is relevant off of the graph in an object store.

Prior Art

  • https://medium.com/neo4j/keeping-track-of-graph-changes-using-temporal-versioning-3b0f854536fa
  • https://community.neo4j.com/t/timeseries-network-graph/44662/3

zprobst avatar Mar 29 '24 21:03 zprobst