orientdb-labs
orientdb-labs copied to clipboard
[OEP 2] Unified Multi-Model (Document/Graph) API and class hierarchy
Reference:
https://github.com/orientechnologies/orientdb-labs/blob/master/OEP_2.md
Summary:
Review the class hierarchy of both Document and Graph APIs to create a unified hierarchy and a unified Multi-Model API.
Goals:
- Finally have a single, Multi-Model API
- Better interoperability between Document and Graph API
- Have a single way to use all the OrientDB functionalities with a single API
- Make it easier to choose the API (no choice at all)
- Make it easier to write docs for the API (no more duplicated docs for document and graph)
Non-Goals:
- deprecate or remove any of the existing data models (Document vs. Graph)
Motivation:
In OrientDB 2.2, Document and Graph APIs are distinct and very different (see doc.field() and vertex.getProperty()). This creates the following problems:
- New users have a hard time choosing the API to start with
- Query results sometimes return a mix of the two models, so it's hard to write consistent code
- We have to write an maintain duplicate docs, one of each API
- We lack the core requirement of a Multi-Model database: a unified Multi-Model API
Description:
Now we have two separate class hierarchies, one for ODocument and one for OrientElement, where the only super-interface is OIdentifiable, but it obviously lacks all the methods for the domain manipulation.
The proposal is following:
- Create a common interface T for both ODocument and OrientElement, with methods to
- get/set properties
- access and manipulate metadata (OClass)
- Have two sub-interfaces of this interface, one for vertices (T1) and one for edges (T2), with specific methods for graph traversal (getEdges(), getVertices(), or maybe out()/in()?)
- Create a new implementation of O_Vertex and O_Edge in the Core module and review their implementation to extend what currently is ODocument. Let OrientVertex and OrientEdge extend this new implementation.
- Unify the implementation of all the hierarchy, to use the same methods for property and schema manipulation
- Refactor ODatabase interface to
- let all the methods return T or Generics<T>
- implement methods to instantiate and retrieve both T and T1/T2 explicitly
- See if it's possible to refactor OrientGraph to implement both ODatabase and TinkerPop Graph.
Alternatives:
- only create the interface hierarchy but avoid to have graph classes extend ODocument class
Risks and assumptions:
- collision with method signatures of other standards (eg. next TinkerPop versions)
Impact matrix
- [ ] Storage engine
- [ ] SQL
- [ ] Protocols
- [ ] Indexes
- [ ] Console
- [x] Java API
- [ ] Geospatial
- [ ] Lucene
- [ ] Security
- [x] Hooks
- [ ] EE
I agree with this enhancement. Unifying the different models' API will make orientdb more comprehensible and usable for users. Moreover IMHO it should increase the perception of working with a real multi-model database. I've never had completely this impression when I was a simple orientdb user.
I agree on this, i would just add some bits, we should have our graph api not dependent to the ThinkerPop API, this should be native and optimized for our Multi-Model cases, anyway the layer of mapping betwheen our API and the ThinkerPop should be as thin as possible, meaning that the behaviour of the api should be really similar.
In genera in term of api i agree on the specified hierarchy, going on the specific i would like to have this api: ODocument:
- boolean isVertex()
- OVertex asVertex()
- boolean isEdge()
- OEdge asEdge()
OVertex:
- Set<OVertex> out(String label)
- Set<OVertex> in(String label)
- Set<OEdge> outE(String label)
- Set<OEdge> inE(String label)
- Set<OVertex> both(String label)
- Set<OEdge> bothE(String label)
- OEdge connect(String label, OVertex target)
- boolean disconnect(OEdge edge) //to evaluate
- boolean disconnect(String label, OVertex target) //to evaluate
OEdge:
- OVertex out()
- OVertex in()
I don't like "connect/disconnect" very much as a name, but the concept is excellent
probably we can stick to the ThinkerPop/Sql like "createEdge" "deleteEdge"
Implementation proposal
New interfaces and classes
interface OElement extends OIdentifiable {
public void setProperty(String name, Object val);
public Object getProperty(String name);
public void delete();
public void save();
}
interface OVertex extends OElement {
public Collection<OEdge> getEdges(Direction direction);
public Collection<OEdge> getEdges(Direction direction, String... type);
public Collection<OEdge> getEdges(Direction direction, OClass... type);
public Collection<OVertex> getVertices(Direction direction);
public Collection<OVertex> getVertices(Direction direction, String... type);
public Collection<OVertex> getVertices(Direction direction, OClass... type);
public OEdge addEdge(OVertex to);
public OEdge addEdge(OVertex to, String type);
public OEdge addEdge(OVertex to, OClass type);
}
interface OEdge extends OElement {
public OVertex getFrom();
public OVertex getTo();
}
class ODocument extends OElement {
...
}
class OVertexImpl extends ODocument implements OVertex {
...
}
class OEdgeImpl extends ODocument implements OEdge {
...
}
Open questions:
- Should OElement extend OIdentifiable? What about result sets (see OEP_5)?
- In case we decide to have OElement that does not extend OIdentifiable, should we still have save() and delete() methods? Do we need another interface in the middle?
- Add asVertex() and asEdge() methods to OElement?
from my point of view, no save, no detete, you should use directly the db.
+1 asVertex, asEdge on the top.
i prefer in
+ out
version, but not a requirement ;)
+1 for db.save() db.delete() We have to define the db interface as well, and the expected behaviors. Eg. how does a db.delete(vertex) affect the POJOs? Does it also disconnect the edge pointers of the connected edges?
New proposal for OElement:
interface OElement extends OIdentifiable {
public void setProperty(String name, Object val);
public Object getProperty(String name);
public Optional<ODocument> asDocument();
public Optional<OVertex> asVertex();
public Optional<OEdge> asEdge();
public boolean isDocument();
public boolean isVertex();
public boolean isEdge();
public Optional<OClass> getType();
}
From last discussion with @tglman:
OElement and OIdentifiable should remain two different hierarchies, because result sets could return projections, that are OElement instances, but do not have an identity.
interface OElement {
public void setProperty(String name, Object val);
public Object getProperty(String name);
public Optional<ODocument> asDocument();
public Optional<OVertex> asVertex();
public Optional<OEdge> asEdge();
public boolean isDocument();
public boolean isVertex();
public boolean isEdge();
public Optional<OClass> getType();
}
Why isDocument()? All the OElement are implicitly documents. Or do you mean returning true when is nor edge neither vertex? How do you know if this is a projection?
@lvca right, there is something missing here, the isDocument() is related to
https://github.com/orientechnologies/orientdb-labs/issues/5
The idea is that a result set will return OResult, that is an OElement but not and ODocument, so that you cannot save it directly (as you could see there is already a lot of discussion on the other thread about this ;-) )
Luigi
I don't like the isDocument/Vertex/Edge. If I'm executing a query like select from V
, I'm expecting to have vertices, so I'd prefer in the result set there are OVertex instances. Calling asVertex()
every time I want the actual vertex is overkill. I liked more the first proposal, but I don't think OVertexImpl and OEdgeImpl should extend ODocument. They could be implemented in a totally different way, where the properties could be wrapped in a Document like structure, but the edges could be stored in RAM in different way?
Hi @lvca
The only point of this OEP is to have a common hierarchy between Document and Vertex/Edge, use a common API and avoid encapsulation. I think this hierarchy should be in the Core, then single TinkerPop implementations can extend these classes or can encapsulate them (like current TP API does with ODocument).
The initial proposal was to have an OElement in the result set, so that, if you are sure it's an OVertex, you can just cast it and use it as is
Actually there is no need for OVertex to extend ODocument, I just need it to implement OElement and OIdentifiable. Users should only use interfaces in next releases, so no ODocument anymore in user code ;-)
I like the OElement and the fact we can break the graph api to be in the core (finally!), but the method asXXX are useless: if the OElement.isVertex() == true, then I can just cast it to OVertex, right?
yep, this was the initial proposal and IMHO it's acceptable.
I think @tglman has something in mind about lazy fetching...
i would say semantically this :
OVartex v = element.asVertex();
OVertex v = (OVertex) element;
are exactly the same thing.
if the element it actually come from a query the second one it can be no true. all the other load case as lazy loading on get i would say the two case are exactly the same. so linking myself to the other OEP: OResult
shoud have asVertex()
OElement(doc,vec,edge) itself is not a requirement, is more uniformity
Agree 100% on unified interface. As I am trying to learn OrientDB I have found it confusing to have these two models (Doc vs Graph) and to try and figure out when do I use one vs the other and the pros/cons of each.
Additional proposal, always related to the APIs, but this time about the DB.
I'd like to add to ODatabase the following methods:
public OClass createClass(String className, String... superclasses);
public OClass createClassIfNotExists(String className, String... superclasses);
public OClass getClass(String className, String... superclasses);
public OClass createVertexClass(String className);
public OClass createEdgeClass(String className);
They just have to delegate to the schema, so they can be implemented as default methods in the interface itself.
The ratio behind it is to give users a shortcut for very common operations.
The first thing a user does in a DB is to create a class, now he has to do
OSchema schema = db.getMetadata().getSchema();
if(!schema.existsClass(className)){
schema.createClass(className);
}
With the new API it would just be
db.createClassIfNotExists(className);
WDYT?
Thanks
Luigi
So what about the index API?
I was waiting for this question ;-) Indexes are mainly related to classes, so IMHO it's OK to get a Class from the DB and use it to manipulate its indexes. Everything is clear and simple, every component has its own responsibility:
- classes belong to the DB
- indexes belong to classes
Manual indexes will be considered an advanced feature, so it's OK for me to let them accessible only from the metadata objects, without cluttering the basic db interfaces.
WDYT?
Thanks
Luigi
Hi,
Yes i had every time the same feeling about the creation/manipulation of the classes, and my problem has been that we will end up cluttering to much the database API, anyway i pro if we can find the right balance from what is in the shortcut and what in 'metadata'.
so for me ok to add on the database API shortcut for class manipulation and maybe user manpulation, and keep index, functions, scheduler, and advanced security(roles ecc) in metadata.
so i would add as well:
void dropClass(String className);
and maybe some api like:
OUser addUser(String username);
void dropUser(String username);
but that's it.
Hi guys, Ok for me. And this is the same reason why I put basic class methods in OrientBaseGraph.
Just pushed the first version of this multi-model API to develop branch
Hi,
This has been fully implemented since 3.0.x so we can consider this closed.
Regards