nodestream
nodestream copied to clipboard
Support Analytics Pipelines
Background
One of the primary use cases for using graph databases is the use of analytics and ML workloads.
Requirements / Principles
If nodestream
were to support analytics jobs, it would be ideal for it to support the same core principles that the remainder of nodestream supports.
- Users should be decoupled from the database allowing them to pick the best database for the job. Therefore the database connector can implement the requisite hooks for operating these analysis alogorithms.
- Users should be able to build jobs declaratively.
Implementation Details
Implementation could essentially follow a similar design approach as migrations
is taking. The core framework handles as much as is prudent and defers to the database connector (which can optionally support the feature) to perform the actual work of data analysis. Steps like copy
and export
mentioned below can be implemented using nodestream's existing copy and pipelines features to retrieve and map data.
Example Project File
# nodestream.yaml
scopes:
# ... for data pipelines
analyses:
- analyses/example.yaml
targets:
anaylitics-graph:
# ....
persistent-graph:
# ...
Example Analysis File
This example pipeline facilitates the copying of data from persistent-graph
to anaylitics-graph
. From there it runs some topological analysis algorithms and persists the results back in persistent-graph
.
# analyses/example.yaml
phases:
# Before we can run the analysis, we need to copy the data into the graph.
# This step will copy the data from the target specified in nodestream.yaml into the graph.
# If you are using a persistent graph, you may not need to run this step.
- name: Copy Data
step: copy
source: persistent-graph
nodes:
- Person
relationships:
- KNOWS
# Project tells the connector which nodes and relationships to include in the analysis.
# For instance, in the case of GDS, this will run a projection.
- name: Project Graph
step: project
projection:
nodes:
- Person
relationships:
- KNOWS
# Next is some example algorithms that we are running.
- name: Run Weakly Connected Components
step: algorithm
algorithm: weaklyConnectedComponents
parameters:
writeProperty: community
- name: Run Degree Centrality
step: algorithm
algorithm: degreeCentrality
parameters:
node_types:
- Person
relationship_types:
- KNOWS
# weightProperty: weight; optional
writeProperty: degreeCentrality
# The export step will export the results of the analysis to the specified target.
# The target must be specified in nodestream.yaml.
# Internally, this will build a nodestream pipeline to extract the data from the graph and write it to the target.
- name: Export Results
step: export
target: persistent-graph
nodes:
- type: Person
properties:
- degreeCentrality
- community
Can be run with nodestream analytics run example --target anaylitics-graph