gremlin-ogm
gremlin-ogm copied to clipboard
An Object Graph Mapping Library For Gremlin
//// Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. //// == An Object Graph Mapping Library For Gremlin
Karthick Sankarachary http://github.com/karthicks
[.lead]
The gremlin-objects
link:gremlin-objects/pom.xml[module] defines a library that puts an object-oriented spin on the gremlin property graph.
It aims to make it much easier to specify business domain specific languages
around Gremlin, without any loss of expressive power.
While it targets the Gremlin-Java variant, the concept itself is language-independent.
=== Introduction
Every element in the property graph, whether it be a vertex (property) or an edge, is made up of properties.
Each such property is a String
key and an arbitrary Java
value.
It only seems fitting then to try and represent that property as a strongly-typed Java
field.
The specific class in which that field is defined then becomes the vertex (property) or edge, which the property describes.
A gremlin object model
such as this would need abstractions to query and update the graph in terms of those objects.
To get the library that facilitates all of this, add this dependency to your pom.xml
:
[source, pom]
A reference use case of this library is available in the following tinkergraph-test
module:
[source, pom]
=== The Object Graph
In this section, we go over how gremlin elements may be modeled, and how those models may be queried and stored.
==== Object Model
Let's consider the example of the person
vertex, taken from the "modern" and "the crew" graphs defined in the link:https://github.com/apache/tinkerpop/tree/master/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/structure/TinkerFactory.java[TinkerFactory].
In our object world, it would be defined as a link:gremlin-objects/src/test/java/org/apache/tinkerpop/gremlin/object/vertices/Person.java[Person] class that extends link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/structure/Vertex.java[Vertex].
By default, the vertex's label matches its simple class name, hence we have to un-capitalize it using the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/model/Alias.java[@Alias] annotation.
The person's name
and age
properties become primitive fields in the class.
The link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/model/PrimaryKey.java[@PrimaryKey] and link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/model/OrderingKey.java[@OrderingKey] annotations on them not only indicate that they are mandatory,
but also allow the person
to be found easily through the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/traversal/library/HasKeys.java[HasKeys.of(person)] SubTraversal
.
Think of the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/traversal/SubTraversal.java[SubTraversal] as a reusable function that takes a GraphTraversal
, performs a few steps on it, and returns it back (to allow for chaining).
The KnowsPeople
field in this class is an example of an in-line SubTraversal
, albeit a stronger-typed version of it called ToVertex
, to indicate that it ends up selecting vertices.
Note that these traversal functions are not stored in the graph.
[source, java]
@Data @Alias(label = "person") public class Person extends Vertex {
public static ToVertex KnowsPeople = traversal -> traversal .out(Label.of(Knows.class)) .hasLabel(Label.of(Person.class));
@PrimaryKey private String name;
@OrderingKey private int age;
private Set<String> titles;
private List<Location> locations; }
Next, we look at its titles
field, which is defined to be a Set
.
As you might expect, the cardinality of the underlying property becomes set
.
Similarly, the locations
field takes on the list
cardinality.
Further, each element in the locations
list has it's own meta-properties, and ergo deserves a link:gremlin-objects/src/test/java/org/apache/tinkerpop/gremlin/object/vertices/Location.java[Location] class of it's own.
[source, java]
@Data @Alias(label = 'location') public class Location extends Element {
@OrderingKey @PropertyValue private String name; @OrderingKey private Instant startTime; private Instant endTime; }
NOTE: The value of the location
is stored in name
, due to the placement of the @PropertyValue
annotation.
Every other field in the Location
class becomes the location
's meta-property.
An edge is defined much like the vertex, except it extends the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/structure/Edge.java[Edge] class.
By default, an edge's label is it's un-capitalized simple class name, and hence no @Alias
is needed:
[source, java]
@Data public class Knows extends Edge { private Double weight;
private Instant since; }
You can find more examples of gremlin object vertices link:gremlin-objects/src/test/java/org/apache/tinkerpop/gremlin/object/vertices/[here] and edges link:gremlin-objects/src/test/java/org/apache/tinkerpop/gremlin/object/edges/[here].
==== Updating Objects
The link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/structure/Graph.java[Graph] interface lets you update the graph using Vertex
or Edge
objects.
You can get it via dependency injection, assuming you've an Object
provider for GraphTraversalSource
:
[source, java]
@Inject @Object private Graph graph;
Or, the good old fashioned way, using the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/provider/GraphFactory.java[GraphFactory]:
[source, java]
private GraphFactory graphFactory = GraphFactory.of(TinkerGraph.open().traversal()); // This gets you the factory for TinkerGraph. private Graph = graphFactory.graph();
Now that we know how to obtain a Graph
instance, let's see how to change it using Java
objects.
Here, we create software
vertices for tinkergraph
and gremlin
, and add a traverses
edge from gremlin
to tinkergraph
.
[source, java]
graph .addVertex(Software.of("tinkergraph")).as("tinkergraph") .addVertex(Software.of("gremlin")).as("gremlin") .addEdge(Traverses.of(), "tinkergraph");
Below, a person
vertex containing a list of locations
is added, along with three outgoing edges.
[source, java]
graph .addVertex( Person.of("marko", Location.of("san diego", 1997, 2001), Location.of("santa cruz", 2001, 2004), Location.of("brussels", 2004, 2005), Location.of("santa fe", 2005))).as("marko") .addEdge(Develops.of(2010), "tinkergraph") .addEdge(Uses.of(Proficient), "gremlin") .addEdge(Uses.of(Expert), "tinkergraph")
To see how the modern
and the crew
reference graphs may be created using the object Graph
interface, go link:gremlin-objects/src/test/java/org/apache/tinkerpop/gremlin/object/graphs/[here].
TIP: Since the object being added may already exist in the graph, we provide link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/structure/Graph.java[various options] to resolve "merge conflicts", such as MERGE
, REPLACE
, CREATE
, IGNORE
AND INSERT
.
==== Querying Objects
There are two ways to get a handle to the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/traversal/Query.java[Query] interface. You can inject it like so:
[source, java]
@Inject @Object private Query query;
Otherwise, you can create it using the GraphFactory
like so:
[source, java]
private GraphFactory graphFactory = GraphFactory.of(TinkerGraph.open().traversal()); private Query = graphFactory.query();
Next, let's see how to use the Query
interface.
The following snippet queries the graph by chaining two SubTraversals
(a function denoting a partial traversal), and parses the result into a list of Person
vertices.
[source, java]
List<Person> friends = query .by(HasKeys.of(modern.marko), Person.KnowsPeople) .list(Person.class);
Below, we query by an link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/traversal/AnyTraversal.java[AnyTraversal] (a function on the GraphTraversalSource
), and get a single Person
back.
[source, java]
Person marko = Person.of("marko"); Person actual = query .by(g -> g.V().hasLabel(marko.label()).has("name", marko.name())) .one(Person.class);
The type of the result may be primitives too, and that is handled as shown below.
[source, java]
long count = query .by(HasKeys.of(crew.marko), Count.of()) .one(Long.class);
Last, we show a traversal involving select steps, which requires special handling as it may return a map.
[source, java]
Selections selections = query .by(g -> g.V().as("a"). properties("locations").as("b"). hasNot("endTime").as("c"). order().by("startTime"). select("a", "b", "c").by("name").by(T.value).by("startTime").dedup()) .as("a", String.class) .as("b", String.class) .as("c", Instant.class) .select();
To see more examples showcasing how the object Query
interface may be used, go link:gremlin-objects/src/test/java/org/apache/tinkerpop/gremlin/object/ObjectGraphTest.java[here].
=== Providers
In this section, we talk about how the gremlin-objects
library can be customized for a graph system
provider.
==== Service Provider Interface
A provider that wishes to plug into gremlin-objects
through dependency injection, will need to provide a GraphTraversalSource
of it's choice, through the Object
qualifier.
For users that don't use dependency injection, they may manually pass the GraphTraversalSource
to the link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/provider/GraphFactory.java[GraphFactory].
==== Registering Native Types
Typically, gremlin property values are Java primitives.
Sometimes, a provider treats a custom type as a primitive.
For instance, DataStax
lets you define property keys of the primitive geometric type Point
.
Such types can be registered using the Primitives#registerPrimitiveClass
methods.
==== Registering Custom Parsers
When a GraphTraversal
is completed, it usually returns (a list of) gremlin Element(s)
.
However, when some providers execute a traversal, the result comprises custom element types.
For instance, when DataStax
executes a graph query, it returns a result set made up of GraphNode(s)
, a proprietary element type.
We give such providers a way to tell us how to parse such custom elements using the Parsers#registerElementParser
method.
=== Analysis
While there exist similar OGM libraries, this one has some key differentiating factors. Now, let's consider the alternatives:
==== GremlinDsl Traversals
The gremlin-core
module defines a link:https://github.com/apache/tinkerpop/tree/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/GremlinDsl.java[GremlinDsl] annotation that lets you define custom traversals by extending the GraphTraversal
and GraphTraversalSource
.
However, it requires some familiarity of gremlin-core
internals.
==== Peopod for Tinkerpop 3 https://github.com/bayofmany/peapod[Peopod] represents elements as annotated interfaces or abstract classes. While it generates boilerplate for traversals to adjacent vertices, it doesn't let you co-locate arbitrary traversals. This library is less intrusive and more flexible.
==== User Defined Steps
An older version of TinkerPop allowed you to define custom steps using Closures
, not unlike the AnyTraversal
and SubTraversal
functions.
However, they aren't as developer friendly as the functional interfaces provided here.
Moreover, it doesn't allow for co-locating the traversal logic along with the element model, as we do here.
=== Future Work
So far, we have the gremlin-objects
library, and a tinkergraph-test
reference use case for it.
Here, we list a few directions in which we see the library evolving:
==== Language Variants
The concept of lifting the property graph into objects is language-independent.
To quote the TinkerPop docs, "with JSR-223, any language compiler written for the JVM can directly access the JVM and any of its libraries", and that would include gremlin-objects
.
For GLVs not written for the JVM, it can be ported over as long as it supports basic reflection.
Case in point, the Gremlin-Python variant could achieve the object mapping through the https://docs.python.org/2/library/functions.html#dir[dir], https://docs.python.org/2/library/functions.html#getattr[getattr] and https://docs.python.org/2/library/functions.html#setattr[setattr] built-in functions.
==== Provider Support
In reality, it is fairly easy for a provider to plug-into gremlin-objects
simply by supplying a GraphTraversalSource
of their choosing.
The ability to register custom primitive types and traversal result parsers allows for further customization.
Since neo4j
already has it's own link:https://github.com/apache/tinkerpop/tree/master/neo4j-gremlin/src/main/java/org/apache/tinkerpop/gremlin/neo4j/structure/Neo4jGraph.java[Neo4jGraph], it's a good candidate to become the next test case.
==== DataFrame Support
Some providers use http://graphframes.github.io/[GraphFrames] to execute bulk operations and graph algorithms on top of Tinkerpop.
Assuming they can work with https://spark.apache.org/docs/1.6.3/api/java/org/apache/spark/sql/DataFrame.html[DataFrames], one could build a GraphTraversalSource
,
which translates the object Graph
and Query
operations into DataFrame
tables, and adapt's it to the provider's GraphFrame
.
==== Traversal Storage
The link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/traversal/AnyTraversal.java[AnyTraversal] and link:gremlin-objects/src/main/java/org/apache/tinkerpop/gremlin/object/traversal/SubTraversal.java[SubTraversal]
interfaces extend https://docs.oracle.com/javase/7/docs/api/java/util/Formattable.html[Formattable] so that the steps defined in it's body can be revealed.
Let's say that we stored the bytecode of these types of functional fields as a hidden property in the element.
That could potentially allow us to execute user defined traversals
using a, say, traversal.call('function-name')
step.