Unique index with neo4j
Hi ! I was looking for unique indexes with bulbs and neo4j and I found this in the source :
# Uncdocumented -- experimental
class UniqueIndex(ExactIndex):
pass
class Property(object):
...
# These aren't implemented yet.
# TODO: unique creates an index
self.indexed = indexed
self.unique = unique
Could you give me status on that ? Do you need help ?
Bulbs has put_unique and get_unique on the default Neo4j index class (ExactIndex) -- I'm not sure if a separate UniqueIndex class is needed:
https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/index.py#L295
But these methods predate Neo4j's new unique endpoints so it would be better to use new ones:
POST http://localhost:7474/db/data/index/relationship/people?unique
See http://docs.neo4j.org/chunked/snapshot/rest-api-unique-indexes.html
Add it to the Bulbs issue tracker or even better, send me a pull request :)
It should be simple if you want to update Neo4jClient to add the put_unique_vertex and put_unique_edge methods -- just copy put_vertex and put_edge and add "?unique" to the path (not the URI var because that's the edge URI you are indexing).
See https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/client.py#L837
And https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/client.py#L921
Then simply update the ExactIndex class to use the new methods:
https://github.com/espeed/bulbs/blob/master/bulbs/neo4jserver/index.py
Note that if you are using the unique index inside a transaction, the unique REST endpoint probably isn't useful. For transactions, you can create a Gremlin script that uses the Neo4j Java API methods for this.
See...
Unique Nodes (this should work for relationships too if you substitute "relationship" for "node"). http://docs.neo4j.org/chunked/stable/transactions-unique-nodes.html
Put if Absent http://components.neo4j.org/neo4j/1.7/apidocs/org/neo4j/graphdb/index/Index.html#putIfAbsent%28T,%20java.lang.String,%20java.lang.Object%29
- James
On Tue, Aug 21, 2012 at 10:13 AM, Emmanuel Tabard [email protected]:
Hi ! I was looking for unique indexes with bulbs and neo4j and I found this in the source :
Uncdocumented -- experimentalclass UniqueIndex(ExactIndex):
passclass Property(object):
... # These aren't implemented yet. # TODO: unique creates an index self.indexed = indexed self.unique = uniqueCould you give me status on that ? Do you need help ?
— Reply to this email directly or view it on GitHubhttps://github.com/espeed/bulbs/issues/58.
Bulbflow: A Python framework for graph databases (http://bulbflow.com)
Thanks for the quick answer ! Here is my use case :
- I need unique and not reusable ids across nodes and relations
- So I extended Node model with a property id = String(default=generateUniqueId, unique=True)
- My id generator should generate unique ids but I want to add a unique constraint on the index to be sure.
- However I saw that neo4j has different indexes for nodes and relations so you can never be sure ... and have to lookup vertices index and edges index ...
I don't know the best practices here ... if you have a recommandation, I'm all ears :)
Yeah, the permanent ID thing is an issue that's been brought up several times on the Neo4j list. They keep saying they're going to make ID resuse configurable, but I don't think the option is in yet.
This also comes into play with Titan, backed by Cassandra.
But even if Neo4j had the ID reuse option, if you start out using Neo4j for a Web app and use the ID in the URL, and then later you need to scale up to something like Titan, you'll break all your links (or at least you'll have an ID mapping layer to mess with).
All this to say, I think creating your own unique ID makes sense in several situations, although I might call it a "key" instead of an ID to avoid confusion.
Note that the new Neo4j unique endpoints behave differently than the Bulbs methods. In the case where an element already exists, it doesn't update the element -- instead it returns the existing element.
See http://docs.neo4j.org/chunked/snapshot/rest-api-unique-indexes.html
How are you creating the unique IDs?
- James
On Tue, Aug 21, 2012 at 11:18 AM, Emmanuel Tabard [email protected]:
Thanks for the quick answer ! Here is my use case :
- I need unique and not reusable ids across nodes and relations
- So I extended Node model with a property id = String(default=generateUniqueId, unique=True)
- My id generator should generate unique ids but I want to add a unique constraint on the index to be sure.
- However I saw that neo4j has different indexes for nodes and relations so you can never be sure ... and have to lookup vertices index and edges index ...
I don't know the best practices here ... if you have a recommandation, I'm all ears :)
— Reply to this email directly or view it on GitHubhttps://github.com/espeed/bulbs/issues/58#issuecomment-7906603.
Bulbflow: A Python framework for graph databases (http://bulbflow.com)
I started using my own random implementation : sha1("%s%s" % (random(), time()) and ended with bson objectid specification :
4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter
And I added an extra byte to type my ids. For example 0 for vertices and 1 for edges.
Did you look at UUIDs?
On Wed, Aug 22, 2012 at 8:50 AM, Emmanuel Tabard [email protected]:
I started using my own random implementation : sha1("%s%s" % (random(), time()) and ended with bson objectid specification :
4-byte timestamp (seconds since epoch), a 3-byte machine id, a 2-byte process id, and a 3-byte counter
And I added an extra byte to type my ids. For example 0 for vertices and 1 for edges.
— Reply to this email directly or view it on GitHubhttps://github.com/espeed/bulbs/issues/58#issuecomment-7934637.
Bulbflow: A Python framework for graph databases (http://bulbflow.com)
Sure, first thing I looked at. But I prefer the idea of having "logic" ids instead of the black magic of uuids. I like the idea of being able to sort indexes by the 4-byte timestamp and maybe archive old nodes if I really need to. Both solutions are great, it just seems smarter to me.
Ok, well enforcing global uniqueness via separate vertex and edge indices won't work so you're going to have to rely on the ID algo. And I agree, sortability is a nice feature, which will come into play in Titan/Cassandra.
Note that the "unique" Property attribute is not implemented yet:
https://github.com/espeed/bulbs/blob/master/bulbs/property.py#L61
When I was writing the Bulbs 0.3 release, Neo4j unique indexes were in a state of flux so I postponed it until things were finalized, but I'm not sure they have yet. Neo4j is redoing its entire indexing framework so that indexing is automatic -- this is important for Cypher because Cypher doesn't have any notion of indexing.
Let me know If you have ideas on implementing a stopgap unique-index Property attribute.
ObjectId spec ensure the uniqueness of ids (used by mongodb). However the last byte (0 vertex, 1 edge) ensure the uniqueness by index, it's impossible to have conflicts between vertex index and edges index.
I need to read more about neo4j unique indexes and I'll come back to you as soon as I have an idea !
You talk a lot about Titan, do you prefer it over neo4j ? I know it's scalable but is it as powerful as neo4j ? Does bulbs support it as well ?
Sorry for the lengthy discussion :)
Titan is a distributed OLTP graph DB, and Marko (the creator of Gremlin) is one of the guys behind it. It has pluggable backend storage (HBAse, Cassandra, or BDB so far), and has much higher write performance.
One of Titan's recent benchmarks shows it can do 5000-10000 transactions per second, vs a few hundred for Neo4j (http://thinkaurelius.com/2012/08/06/titan-provides-real-time-big-graph-data/). Neo4j should be faster for traversals since everything is local, but I haven't seen an apples to apples comparison.
Titan can run in Rexster, and Bulbs will support Titan via the Rexster adapter. It does indexing differently than all the other Blueprints DBs so I'll have to rework the Rexster indexing code to make it work with Bulbs.
Wow, awesome ! Next project, I'll give it a try !
Tanks a lot :)
Has there been any progress with declaring uniqueness of elements? I would me happy to contribute the code if I could get some guidance on where to start.
Neo4j is about to release its new indexing framework, which will include autoindexing and unique indexes.
See https://groups.google.com/d/msg/neo4j/d73yLtLqPw4/HM3qxBpn0JcJ
Since Neo4j and Titan are the two primary DBs supported by Neo4j Server and Rexster, anything done prior to seeing Neo4j's new indexing framework would be a stop-gap measure.
Indexing is one of the things that varies significantly from DB to DB so to accommodate for this, there will have to be unique_index methods in each server's Client class, each written to a universally compatible method signature.
Then the DB's index classes would have methods using these Client methods.
And finally we would add support for unique=True in the Model and Property class.
Bulbs 0.4 will include support for Neo4j's new indexing framework.