nodestream
nodestream copied to clipboard
Support Temporal Versioning
Is your feature request related to a problem? Please describe.
Versioning allows us to keep track of change by querying over a range of time and assess the database's current state at various points in time. This allows for useful operations such as what-if analysis.
Describe the solution you'd like
Ideally, nodestream
can introduce a change that allows for versioning of nodes and relationships to happen. From a developer perspective, you could do something like this:
- Create a pipeline with nodes and relationships like normal. For example, a Pipeline that connects
User
andPost
nodes through aLIKES
relationship. - Run the pipeline like normal (
nodestream run likes-pipeline --target my-db
) - Run a CLI command to generate versioning configuration as a new migration for the node and relationship types. Note that you do not need to version the entire schema:
nodestream version User Post LIKES
- Run the migrations
nodestream migrations run --target my-db
- Run the pipeline like normal,
nodestream run likes-pipeline --target my-db
, and get a versioned schema. (Defined below)
Relationship Versioning
Relationship versioning can be accomplished by changing the relationship type of from RELATED_TO
to OLD_RELATED_TO
. Each relationship established can be assigned with a from
property that represents the point in time that the relationship was established. When a relationship is expired, we can add the OLD_
prefix and set a to
timestamp that established the point the relationship was desolved. NOTE this varies from other versioning schemes by establishing the OLD_
prefix, which allows "current state" queries to remain unchanged.
Node (Property) Versioning
Node versioning can be obtained easily on top of relationship versioning. Each state change is represented as a new version node where the current state is both on the actual entity node as well as the latest version. This allows you to match an entity and then retrieve its current state at that time. Similarly, nodes that are expired can be via the Old
prefix such as OldNodeType
.
Describe alternatives you've considered
There are several other possible alternatives to the temporal versioning itself.
- Don't do temporal versioning at all.
- Save snapshots of the graph at an interval. This does not allow for several possible cases including more granular look backs, comparing from a point in time through now, and more. If you need granular, you could effectively event source
nodestream
and start at some snapshot and then play events until a relevant point in time. - Save the data that is relevant off of the graph in an object store.
Prior Art
- https://medium.com/neo4j/keeping-track-of-graph-changes-using-temporal-versioning-3b0f854536fa
- https://community.neo4j.com/t/timeseries-network-graph/44662/3