Frontenac icon indicating copy to clipboard operation
Frontenac copied to clipboard

objective

Open SepiaGroup opened this issue 11 years ago • 76 comments

Louis,

I am interested in implementing this in .Net as well. what is your objective in doing this? are you planning on continuing this to port neo4j to .net?

Michael

SepiaGroup avatar Apr 26 '13 21:04 SepiaGroup

Hello Michael,

I discovered in the latests days that you implemented a neo4j rest client that looks very nice.

My objective is not to create another neo4j .Net client, but rather to use an existing one to implement a Blueprints Property Graph with blueprints-core .Net.

In fact, It would be very nice to see an implementation that uses both Frontenac and Neo4jRestNet become possible. Maybe we could unite our efforts?

There is a lot of people who ported Blueprints implementations or commuication protocols for other graph databases to C#:

Tomas Bosak created OrientDB-NET.binary ArangoDB-NET and rexster.net

Daniel Kuppitz has RexProClient

Also, threre is a .NET RDF library available at http://www.dotnetrdf.org/ And the socialmedia research foundation created NodeXL

We could all benefit of a common set of interfaces to access in a standard way all of these great libraries. I think blueprints-core .Net and it's test suite can help here.

Loupi avatar Apr 27 '13 02:04 Loupi

Louis,

i wrote Neo4jRestNet because at the time i was looking for a rest lib for neo4j in C# i could not find one i liked. I did not like the way Neo4jClient was implemented at the time, but to be fair, i have not looked at it since i wrote my so maybe it has changed. but my interest in what you are doing is not because of a rest api client.

I like neo4j and use it for one of my clients. let me first say, I don't have anything against linux or the open source community but..., take what i am going to say next that i am a .net guy and i am not a huge java developer. i just don't like java all that much (not that there is anything wrong with java but it is kinda verbose and a pain to write in compared to c# or f# and no linq library). so with that said, i have been thinking that writing a graph data store in .net using c# and f# would be interesting and very useful for the .net community, especially if it would run on mono. the cypher language would be straight forward in f# and the functional nature of f# and linq is very well suited for traversing nodes/edges.

I do understand that what you are writing is blueprints and not a graph data store but it is something that would need to be developed if you wanted to make your own store. i don't know if this sounds like a foolish idea (you are more than welcomed to say it is crazy) but i think a quality open source graph db in .net would get some (maybe not a lot) attention and would be interesting to do in c#/f#.

so if you don't think that this is too crazy of an ideal let me know.

thanks michael

On Apr 26, 2013, at 10:29 PM, Louis-Pierre Beaumont [email protected] wrote:

Hello Michael,

I discovered in the latests days that you implemented a neo4j rest client that looks very nice.

My objective is not to create another neo4j .Net client, but rather to use an existing one to implement a Blueprints Property Graph with blueprints-core .Net.

In fact, It would be would be very nice to see an implementation that uses both Frontenac and Neo4jRestNet become possible. Maybe we could unite our efforts?

There is a lot of people who ported Blueprints implementations or commuication protocols to for other graph databases to C#:

Tomas Bosak created OrientDB-NET.binary https://github.com/yojimbo87/OrientDB-NET.binary ArangoDB-NET https://github.com/yojimbo87/ArangoDB-NET and rexster.net https://github.com/yojimbo87/rexster.net

Daniel Kuppitz also has RexProClient https://github.com/dkuppitz/rexpro-client/tree/master/RexProClient

Also, threre is a .NET RDF library available at http://www.dotnetrdf.org/ And the socialmedia research foundation created NodeXL http://nodexl.codeplex.com/

We could all benefit of a common set of interfaces to access in a standard way all of these great libraries. I think blueprints-core .Net can help here.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 27 '13 03:04 SepiaGroup

Hi guys,

I understand that Louis is doing a TinkerPop stack port, but I would like to add my input regarding the .NET based graph store which Michael was talking about.

First of all I don't think that making a neo4j based .NET port would make much sense because of two main points:

  1. It's built with Java specific API and philosophy. Now, to be clear, I'm not saying that Java is bad or worse technology, but (apart from very similar C based syntax and here and there roughly similar API) .NET has somewhat different set of principles which could/should be used to exploit competetive advantage in many areas.
  2. If you do only a port of it, you would probably end up being constantly catching up with the latest (or not so latest) releases given the state of neo4j on the market, number of developers and its wide community.

Therefore, I think, it would make much more sense to create a .NET graph store which is not a spinoff of any Java based graph databases, although some ideas/principles would be interesting to have, such as LINQuified cypher language. I don't know what's the state of existing .NET graph databases on the market, but I'm sure there are some, although probably none of them is as popular as neo4j or other Java based graph stores which is a mystery to me since, I think, there is a lot of potential, use cases and users/developers for graph databases built on top of .NET stack.

yojimbo87 avatar Apr 27 '13 09:04 yojimbo87

Tomas,

i could not agree more! i am not suggesting porting neo4j, .Net has a lot of features that are very well suited to making a very efficient graph store and they should be used. Not to mention the power LINQ/PLINQ and conciseness of f# would be a huge benefit.

on the topic of .net graph stores there are a few that i know of:

Trinity: which is written by MS but is only for research http://research.microsoft.com/en-us/projects/trinity/

Sones: http://www.sones.de/static-en/ but they went out of business. they were a proprietary database which could explain why they did not make it. if you search around about on them you will see that they did receive a lot of funding. when they went under they opened up their source code and can still download.

other then that i have not see much beyond simple hacks at doing this.

I too am confused why there is not a viable graph store on the .net stack, or at least on Azure. however, i do feel that it is something that is sought after (just look on SO) and would be used. Having a .net store would simplify my current project and i would have used one if it was out there.

thoughts? michael

On Apr 27, 2013, at 5:27 AM, Tomas Bosak [email protected] wrote:

Hi guys,

I understand that Louis is doing a TinkerPop stack port, but I would like to add my input regarding the .NET based graph store which Michael was talking about.

First of all I don't think that making a neo4j based .NET port would make much sense because of two main points:

It's built with Java specific API and philosophy. Now, to be clear, I'm not saying that Java is bad or worse technology, but (apart from very similar C based syntax and here and there roughly similar API) .NET has somewhat different set of principles which could/should be used to exploit competetive advantage in many areas. If you do only a port of it, you would probably end up being constantly catching up with the latest (or not so latest) releases given the state of neo4j on the market, number of developers and its wide community. Therefore, I think, it would make much more sense to create a .NET graph store which is not a spinoff of any Java based graph databases, although some ideas/principles would be interesting to have, such as LINQuified cypher language. I don't know what's the state of existing .NET graph databases on the market, but I'm sure there are some, although probably none of them is as popular as neo4j or other Java based graph stores which is a mystery to me since, I think, there is a lot of potential, use cases and users/developers for graph databases built on top of .NET stack.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 27 '13 13:04 SepiaGroup

Hi guys!

I agree with both of you, and no, you are not crazy Michal, just ambitious and that's ok! I too thought I was crazy to port blueprints to NET, now I'm not alone! ;)

I also think that LINQ and a lot of other .NET technologies would make a lot of sense in a graph database framework. I would be very interested to contribute to such a framework in .NET.

Talking about spinoff, I may be facing some java spinoff issues too: https://github.com/dkuppitz/rexpro-client/issues/3 Maybe it's time to refactor! :) I'm aleady seing a lot of these issues/(opportunities to adapt in .NET/LINQ) arise in the upcoming ports of Pipes and Gremlin. Please let me know what you think about that.

About existing stores Trinity is dead. Microsoft is now in partnership with Hortonworks who will bring a SQL Server addon that can read write to hadoop stores.

There is also VelocityDB

Loupi avatar Apr 27 '13 14:04 Loupi

Hortonworks is for hadoop - not really a graph db.

velocitydb - interesting but again it is a object store not a graph db. when i looked at it a year or so ago it was not a good replacement for neo4j.

i did see the comment about you using java names - stop doing that. :) I use resharper to keep me consistent. for the most part i like it but it does slow down vs a little.

well not to sound to ambitions - i am interested in doing this for no other reason than to develop something that interests me.

On Apr 27, 2013, at 10:16 AM, Louis-Pierre Beaumont [email protected] wrote:

Hi guys!

I agree with both of you, and no, you are not crazy Michal, just ambitious and that's ok! I too thought I was crazy to port blueprints to NET, now I'm not alone! ;)

I also think that LINQ and a lot of other .NET technologies would make a lot of sense in a graph database framework. I would be very interested to contribute to such a framework in .NET.

Talking about spinoff, I may be facing some java spinoff issues too: dkuppitz/rexpro-client#3 Maybe it's time to refactor! :) I'm aleady seing a lot of these issues/(opportunities to adapt in .NET/LINQ) arise in the upcoming ports of Pipes and Gremlin. Please let me know what you think about that.

About existing stores Trinity is dead. Microsoft is now in partnership with Hortonworks who will bring a SQL Server addon that can read write to hadoop stores.

There is also VelocityDB

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 27 '13 15:04 SepiaGroup

I think the biggest challenge is to implement the storage engine which involves CAP theorem related problems. In Java world the situation is much simpler since there are lot of engines out there that could be used for this stuff. For example, if I remember correctly, neo4j is using Lucene and Titan is running on Cassandra. That way you don't have to deal with low level stuff and concentrate on core functionality. This approach is kind of problematic in .NET world since it would render the database solution not to be purely .NET based.

yojimbo87 avatar Apr 27 '13 18:04 yojimbo87

There is a port of Lucene for .NET. Officially released Oct 2012.

Loupi avatar Apr 27 '13 21:04 Loupi

Well the actual storing is the reason i have not started. i have not found a way on windows to handle the storage in an efficient, concurrent and fault tolerant way. there is the .net memory map which is the closes thing i can find in .net. not sure if it is a good fit however.

as far as indexing using lucene - i think that we can solve this after we figure out how to hand the actual storing of data to disk.

any suggestions?

On Apr 27, 2013, at 5:25 PM, Louis-Pierre Beaumont [email protected] wrote:

There is a port of Lucene for .NET. Officially released Oct 2012.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 27 '13 21:04 SepiaGroup

Looking at "The-Benefits-of-Titan", Titan also supports Berkeley DB as a storage mechanism. It offers CA from CAP theorem. I don't know about performances here too, and it is not distributed. Quickly looking at Berkeley DB docs, it is written in C, can run on Windows and there is a C# API available (P/Invoke). It's a bit like SQLite.

There is also Microsoft ESENT.

RavenDB uses Lucene.NET and ESENT under the hood.

Loupi avatar Apr 27 '13 22:04 Loupi

correct me if i am wrong here, but titan is another graph db. it was/is written by the guys at tinkerpop. they built titan to use either hbase, cassandra or berkeley db depending on how you want CAP.

Now i dont know what neo4j uses but their db is on a single instance and replicated to other nodes in a HA configuration using zookeeper.

in .net land i don't know of any thing that is similar to these packages. this is where .net really is lacking and why large systems are built on linux/java.

but with that said, sones did create a graph db in .net

the source code is here

https://github.com/sones/sones

i will look more into how they physically write/update data to disk - to be honest i really don't know how they do it and i look at the code once before :)

The code is well documented and does use a lot of linq/plinq and is very interesting.

let me know if you make any headway on deciphering it.

On Apr 27, 2013, at 6:54 PM, Louis-Pierre Beaumont [email protected] wrote:≈

Looking at "The-Benefits-of-Titan", Titan also supports Berkeley DB as a storage mechanism. It offers CA from CAP theorem. I don't know about performances here too, and it is not distributed. Quickly looking at Berkeley DB docs, it is written in C, can run on Windows and there is a C# API available (P/Invoke). It's a bit like SQLite.

There is also Microsoft ESENT.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 28 '13 01:04 SepiaGroup

Your right about Titan. I like it's idea of abstracting the data storage layer.

I found that for neo4j: an overview of neo4j internals Rooting out redundancy - The new Neo4j Property Store

I'm going to look at sones source to see if I can understand it's storage mechanism.

Being optimistic, I think that both RavenDB (a .NET NoSQL database) and BerkeleyDB could be used to perform storage in a similar way that Titan and neo4j do. They both offer replication, and RavenDB is scalable and supports failover.

Loupi avatar Apr 28 '13 02:04 Loupi

Ok, I tried to basically understand how sones works. I must say that I'm impressed.

There is a service layer, with a plugin architecture. IGraphDS is sones service interface. GraphDSServer serves it, and GraphDSClient consumes it.

It has a plugin architecture. Plugins can be created for their query language, to import and export graphs and to perform Indexing.

It has 2 indexing plugin implementations: Lucene.NET and Memory Based.

GraphDSServer uses an IGraphDB to perform it's operations on a graph database. There is 1 IGraphDB implementation: SonesGraphDB.

SonesGraphDB internally uses an IGraphFS to perform IO operations. I could only find one implementation of IGraphFS: InMemoryNonRevisioned

Maybe I missed something, but I'm under the impression that sones does not store nothing on disk. (Apart it's IO plugins, and that's not what we are looking for here). In their external libraries, I found BplusDotNet, which could be used to serialize b+ trees, but I could not find where it is used.

Loupi avatar Apr 28 '13 05:04 Loupi

I looked at sones more last night and that is exactly what I can to understand as well. Maybe they do have a store plugin but have not shared it.

I looked at the links you sent, the first one is very interesting and informative. I seem to remember reading an tech paper on storing graph data similar to this, I will see if I can find it again. But it looks like they developed their own file system layer on top of standard java io calls, something like mapped files. Have you seen the memory mapped file in .net? It may be something close to what the are using for the actual io.

I will look more into your suggestion of using a db like titon does. That may be a faster way to get started. The idea of abstracting the file system, like titon does, is a very good one.

On Apr 28, 2013, at 1:53 AM, Louis-Pierre Beaumont [email protected] wrote:

Ok, I tried to basically understand how sones works. I must say that I'm impressed.

There is a service layer, with a plugin architecture. IGraphDS is sones service interface. GraphDSServer serves it, and GraphDSClient consumes it.

It has a plugin architecture. Plugins can be created for their query language, to import and export graphs and to perform Indexing.

It has 2 indexing plugin implementations: Lucene.NET and Memory Based.

GraphDSServer uses an IGraphDB to perform it's operations on a graph database. There is 1 IGraphDB implementation: SonesGraphDB.

SonesGraphDB internally uses an IGraphFS to perform IO operations. I could only find one implementation of IGraphFS: InMemoryNonRevisioned

Maybe I missed something, but I'm under the impression that sones does not store nothing on disk. (Apart it's IO plugins, and that's not what we are looking for here). In their external libraries, I found BplusDotNet, which could be used to serialize b+ trees, but I could not find where it is used.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 28 '13 15:04 SepiaGroup

Louis,

after reading the few links you sent and a good night sleep i may have an approach (maybe).

all actions on the graph can be done in a sorted order that is reproducible every time. that is, create a node, add properties will produce the same result every time and makes sense. (add properties then create node does not make sense - ie in the wrong order).

so if you write out the actions to a transaction log file in sorted order to the bottom of the file and read off the top and process them into the graph data file you should not have any issues implementing this. if you cache the graph data in memory and make changes to this cache, you will also write the changes to the log file. so new nodes and updated properties show up instantly in memory and are written some time later to the tans log file. you should not have to go to the graph data file for this data since it is already in memory. if you are asked to look at a node that is not in memory you make a call to read it from disk and cache that node (and other data if needed). once it is in memory you proceed to update it and write the updates to the trans log file. this approach should not be to difficult in .net and a simple proof of concept should be easy to bang out.

the issue now is getting data that is pushed out of memory. i dont think this would be an issue because we would be in control of the memory cache and can handle this (ie. dont flush data that has pending writes - details to be worked out later...).

i think this is what neo does from reading the docs you sent. they also ship these log files to the master node in an ha configuration.

what are your thoughts on this high-level approach?

On Apr 28, 2013, at 1:53 AM, Louis-Pierre Beaumont [email protected] wrote:

Ok, I tried to basically understand how sones works. I must say that I'm impressed.

There is a service layer, with a plugin architecture. IGraphDS is sones service interface. GraphDSServer serves it, and GraphDSClient consumes it.

It has a plugin architecture. Plugins can be created for their query language, to import and export graphs and to perform Indexing.

It has 2 indexing plugin implementations: Lucene.NET and Memory Based.

GraphDSServer uses an IGraphDB to perform it's operations on a graph database. There is 1 IGraphDB implementation: SonesGraphDB.

SonesGraphDB internally uses an IGraphFS to perform IO operations. I could only find one implementation of IGraphFS: InMemoryNonRevisioned

Maybe I missed something, but I'm under the impression that sones does not store nothing on disk. (Apart it's IO plugins, and that's not what we are looking for here). In their external libraries, I found BplusDotNet, which could be used to serialize b+ trees, but I could not find where it is used.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Apr 29 '13 17:04 SepiaGroup

Hello,

The transaction log approach makes me think of SQL Server Log Shipping, but from memory to GraphDB. I wonder what caching system neo4j uses. Maybe memory mapped files, memcached or redis could be used here. I really like this approach of caching and doing batch writes.

Also, I've read some more theory on graph data structures. Here they talk about index-free-adjacency and provide 3 algorithms to deal with it. I'm sure both neo4j and Titan implement either Adjacency List or Incidence List. I think it is now the time to read their source and see how they serialize it to their store. We will then understand how they use keyvalue stores to persist the data.

After having read this article, I'm tempted to create a proof of concept that implements Adjacency List or Incidence List with ESENT. Why am I so excited about ESENT? I haven't tried it yet, but reading the docs it has a nice indexing mechanism, and works a bit like a big table. I think it could be used to achieve Vertex Indexing, and Edge Indexing. Reading the wiki, they say it can also store sequential data (the adjency lists?). Anyway, I'm only speculating, need to try it now.

About RavenDB, I found this comment from Oren Eini, where he says that it is not primarily designed for graphs, but could be used for it with custom bundles.

Loupi avatar Apr 29 '13 23:04 Loupi

the trans log is very similar to sql but in this case it will work very good - hence that is why neo does this.

i am not sure that we would need to use mapped files or any other cacheing implement ion at this time. i am thinking that we just have a pointer to the head of the graph and the rest linked off of that. to get to nodes by id we have a key/value dictionary that point to the objects.

i think the approach to go with is using an incidence list to start with.

on the storage using ESENT. ESENT is the new incarnation of the JET db built into windows. i know active directory uses it. When it was called JET (many years ago) it was a modified version of the access db. to me i am not to concerned in what method is used to write to the dive as the methods will be an implementation of an interface. this will allow many implementations with not core code changes. if you want to learn ESENT - go right ahead.

i my have time this week to start on a proof of concept for the graph objects and trans logs. what do you think if we start a new git repo for this work?

On Apr 29, 2013, at 7:00 PM, Louis-Pierre Beaumont [email protected] wrote:

Hello,

The transaction log approach makes me think of SQL Server Log Shipping, but from memory to GraphDB. I wonder what caching system neo4j uses. Maybe memory mapped files, memcached or redis could be used here.

Also, I've read some more theory on graph data structures. Here they talk about index-free-adjacency and provide 3 algorithms to deal with it. I'm sure both neo4j and Titan implement either Adjacency List or Incidence List. I think it is now the time to read their source and see how they serialize it to their store. We will then understand how they use keyvalue stores to persist the data.

After having read this article, I'm tempted to create a proof of concept that implements Adjacency List or Incidence List with ESENT. Why am I so excited about ESENT? I haven't tried it yet, but reading the docs it has a nice indexing mechanism, and works a bit like a big table. I think it could be used to achieve Vertex Indexing, and Edge Indexing. Reading the wiki, they say it can also store sequential data (the adjency lits?). Anyway, I'm only speculating, need to try it now.

About RavenDB, I found this comment from Oren Eini, where he says that it is not primarily designed for graphs, but could be used for it with custom bundles.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 01 '13 19:05 SepiaGroup

Not sure if you guys ever came across https://github.com/cosh/fallen-8
Its a C# in-memory graph database

mickdelaney avatar May 02 '13 19:05 mickdelaney

Thank you Mick for this link. I will have a look at it. I've been reading Titan file system source in my spare times this week. The guys at Tinkaurelius did a wonderfull job, and the code is all documented.

Michael, I do agree with you on the abstraction of the filesystem, and for the new repo too. Do you think we should host it here, at Sepiagroup, or anywhere else?

I've started playing with ESENT, and if I work hard this week-end, I may be able to commit a proof of concept. I'll then adapt it to fit the FS interface. I think in a coupe of weeks, if everything goes well, we will be in a good position to integrate the trans log with the FS.

Loupi avatar May 03 '13 01:05 Loupi

Mick again thanks for the link - that looks very good!

louis you may also want to take a look at how he stores his data. he uses a api that he developed. take a quick look at this http://www.slideshare.net/HenningRauch/graphdatabases slid 73 seems interesting if true. however, it does not look like this is under active development.

i would like it on SepiaGroup - the name sepia is latin for a cuttlefish (also the color brown) which has the ability to change shape, color and texture (really an amazing animal if you ever get the chance to dive and see one), kinda like a schema-less database (hence the reason i came up with the name). however i am not going to make a stink about it if you are willing to be a partner. other wise i say we start a new one. let me know and i will build a repo.

On May 2, 2013, at 9:44 PM, Louis-Pierre Beaumont [email protected] wrote:

Thank you Mick for this link. I will have a look at it. I've been reading Titan file system source in my spare times this week. The guys at Tinkaurelius did a wonderfull job, and the code is all documented.

Michael, I do agree with you on the abstraction of the filesystem, and for the new repo too. Do you think we should host it here, at Sepiagroup, or anywhere else?

I've started playing with ESENT, and if I work hard this week-end, I may be able to commit a proof of concept. I'll then adapt it to fit the FS interface. I think in a coupe of weeks, if everything goes well, we will be in a good position to integrate the trans log with the FS.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 03 '13 03:05 SepiaGroup

I looked a bit at https://github.com/cosh/fallen-8/blob/master/Fallen-8/Persistency/PersistencyFactory.cs. The way I understand it, it is writing/reading a whole graph at once. I could not find a query language, apart some stuff in algorithms folder. The fact that the whole graph is in memory could maybe explain the performance graph of slide 73.

I'm ok for hosting it on Sepiagroup. Cute fish! There is an expert diver at my job, I'm going to give him this info, I'm sure he is going to like it :)

Loupi avatar May 03 '13 05:05 Loupi

Henning Rauch told me he's re-writing fallen-8 in c++, but that if any bugs appear in the c# version he'll fix them, so he's still active in the space, might be worth including him in your discussions, probably some cross pollination.

mickdelaney avatar May 03 '13 07:05 mickdelaney

i just sent him an email with this thread, hopefully he'll jump in with some thoughts...

mickdelaney avatar May 03 '13 07:05 mickdelaney

Morning guys!

my name is Henning Rauch. I'm the former head of R&D of sones... so if you got questions concerning that, I can give you answers in any detail. Furthermore I built Fallen-8 after leaving the company.

(I'm going to write sth about those two products right now...)

Cheers, Henning.

cosh avatar May 03 '13 07:05 cosh

sones GraphDB: A GraphDB that aimed to kick ass all existing databases :). Well to make a long story short: I didn't worked out that well. It was separated into two parts a community edition and an enterprise edition. The only difference between those was the persistent filesystem. The main features of the sones graphdb were the nice separation of all layers (service, graph, query language and filesystem).

cosh avatar May 03 '13 07:05 cosh

Fallen-8: This project reflects my learnings from the sones GraphDB. Instead of trying to create a "one fits all" solution I created a product for the niche. It's main focus is analytics. Thats why it's in-memory. BUUUT it has some kind of checkpointing functionality. So at any point in time the user is able to create sth like a savegame :). This action should be as fast as possible. I did a lot of consulting in 2012 on this project and developed nice services on top of it.

cosh avatar May 03 '13 07:05 cosh

The new Fallen-8: It's written in C++ and will be visible soon. I decided to use MIT license again. It will be faster, consume even less memory and would support some other nice features. For me that's the next baby step towards a distributed in-memory graphdatabase. This is my ultimate goal and everything will be as free as possible.

cosh avatar May 03 '13 07:05 cosh

@Loupi concerning persistency: you are absolutely right. It's totally in-memory. No evil caching. The benchmarks you are referring to used the "strong caches" of neo4j. So I tried to have everything in memory there too. BUT the numbers are saying that caching is not the same as in-memory (cpt. obvious :) )

cosh avatar May 03 '13 07:05 cosh

@SepiaGroup Is it true? Yes it is. The numbers of traversals per second are still growing. I convinced some companies to use it and they are really happy with it. But I need to repeat it: It's focus is analytics, so be sure to use sth persistent underneath and create a fast ETL job.

cosh avatar May 03 '13 07:05 cosh

@cosh Hi Henning, is speed and memory consumption the only reason why you rebuilt the fallen-8 in C++ or are there also other factors? What are your thoughts on creating a graph database on top of some fast K/V store like redis for example?

yojimbo87 avatar May 03 '13 08:05 yojimbo87

@yojimbo87 Hi. Those were the main reasons for me. The architecture of the new F8 will go in the same direction as you described. One difference: I'm not going to use redis. I'm using my own in-memory column store which is in my opinion perfect for my requirements. Besides that, I would like to use other low level libs which allow me to do RDMA to extend F8 to more than node.

cosh avatar May 03 '13 09:05 cosh

Hi Cosh, nice to meet you. I really appreciate your presence here. I like how sones abstracts all layers of a graph DB system. Also, by looking at sone's source I discovered about Irony, which looks very interesting. I see that sones has GraphQL plugins too. It is great and educative to see different ways of implementing a query language in a Graph DB (comparing with Gremlin here).

I'm curious about the enterprise version of sones: how does it store the graph on disk?

I'm tempted to implement a blueprints-fallen-8-graph with Frontenac. What do you think about it?

Loupi avatar May 03 '13 14:05 Loupi

Henning,

Can you explain why sones did not make it? From what I can find about them, they were well funded and had a good idea. I wouldn't like to make the same mistakes as they did.

On May 3, 2013, at 5:07 AM, Henning Rauch [email protected] wrote:

@yojimbo87 Hi. Those were the main reasons for me. The architecture of the new F8 will go in the same direction as you described. One difference: I'm not going to use redis. I'm using my own in-memory column store which is in my opinion perfect for my requirements. Besides that, I would like to use other low level libs which allow me to do RDMA to extend F8 to more than node.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 03 '13 14:05 SepiaGroup

@Loupi Yep, the layers were great. It's necessary for proper testing and of course for their "enterprise"-concept. Concerning Irony: This is one of the greatest libs I used in .NET. Really really great. It needs some time to get familiar with it but in the end it works really great. We never had any bigger issues there and the query language was one of the biggest pros of the sones GraphDB in that time. I designed big parts of the language and if you are interested I could timewarp my brain into the past and see how I can help you. Concerning Storing on disk: there were multiple approaches to that challenge. The idea was to create a revisioned multi-purpose and distributed file-system. The last and version reused ideas from RDBMS... i.e paging, multiple layers... The performance was quite OK in the end but not as good as our competitors. If you want more info, contact me.

Concerning blueprints-fallen-8-graph: I would be honored if you would like to do that and would support you as much as possible. What would be the effort?

cosh avatar May 03 '13 14:05 cosh

@SepiaGroup They were well funded and had great ideas... BUT:

  1. technology was not focussed
  2. founder-internal-problems
  3. lost too many POCs
  4. overselling

In the end they gave me the beautiful opportunity to find my passion. That's why I'm still very proud of that part of my life.

cosh avatar May 03 '13 15:05 cosh

Henning,

my goal in building a .net graph db would be as follows

  1. a graph db as functional as neo4j. not a port of neo but something that is as functional as neo but .net centric.
  2. query language that is LINQ centric, strongly typed and intuitive.
  3. well designed and fast
  4. can run on windows, mono and has backend stores optimized for different platforms (windows, amazon, azure etc…)
  5. implements well establish interfaces, blueprints etc.

thats the main items (i am sure i am forgetting a few).

that said, fallen8 satisfies a few and sones satisfy a few as well, but neither satisfy all. I really like your data model but i am not sure it will be easy to modify it so that when a node/edge/property gets created/updated/deleted to have it store to disk. you can correct me if i am wrong on this.

at a high level what i am thinking is that when a nodes/edges/properties is created/edited/deleted then an entry would be written to a transaction log that would very quickly write it to the end of a log file. then another process would read these logs and then apply them to the graph data store. this would survive a fault as well because when the systems starts again it will continue processing the logs. any nodes that have been created/edited would be in memory so there would be no need to read from the data store again, so this will allow the updates to the data store happen at a slower speed. i know that neo works something like this. i am wondering if your data model would be well suited for something like this. also we could abstract the storage interface so that we could be several data storages and not affect the core code. what are your thought on this approach? if you have a better idea i would like to hear that as well.

thanks a lot of your insight. michael

On May 3, 2013, at 11:05 AM, Henning Rauch [email protected] wrote:

@SepiaGroup They were well funded and had great ideas... BUT:

  1. technology was not focussed
  2. founder-internal-problems
  3. lost too many POCs
  4. overselling

In the end they gave me the beautiful opportunity to find my passion. That's why I'm still very proud of that part of my life.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 03 '13 20:05 SepiaGroup

  1. Do you want to do this for fun or business?
  2. Your transaction approach is definitly valid and should be easy to implement. I've been asked by a customer to develop sth like this. A simple write ahead log. This would fit the needs of many people. It has to be asynchronous of course.
  3. I would reuse the plugin management of F8 and the services.
  4. You could also reuse the F8 kernel and change the create/update/delete methods to support that WAL. And there must be a global flag that states if the database is sane or not. If not, you would have to replay the WAL.
  5. The LINQ stuff should be implemented on top of your kernel

In the end its your decision but F8 might bring you some low hanging fruits and I already know 1-N customers who would like to have it.

Cheers, Henning

cosh avatar May 03 '13 21:05 cosh

i would like to do it for business but in the beginning i don't know how much demand there is for this in .net. is your customer a paying client? i am a contract developer and have my own company for the past three years. i am always looking for more work - mostly .net

i will look more into your data model this weekend and let you know if i have any questions.

with the query language - i agree it would be on top of the kernel but the kernel would need to have the data in a format that is usable.

i think that f8 would be a good place to start and modify it when needed. also having you as a resource is a huge help.

Thanks

On May 3, 2013, at 5:01 PM, Henning Rauch [email protected] wrote:

Do you want to do this for fun or business? Your transaction approach is definitly valid and should be easy to implement. I've been asked by a customer to develop sth like this. A simple write ahead log. This would fit the needs of many people. It has to be asynchronous of course. I would reuse the plugin management of F8 and the services. You could also reuse the F8 kernel and change the create/update/delete methods to support that WAL. And there must be a global flag that states if the database is sane or not. If not, you would have to replay the WAL. The LINQ stuff should be implemented on top of your kernel In the end its your decision but F8 might bring you some low hanging fruits and I already know 1-N customers who would like to have it.

Cheers, Henning

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 03 '13 21:05 SepiaGroup

Yes, they are paying for F8 service development. Meaning I implemented a lot of these: https://github.com/cosh/fallen-8/blob/master/Fallen-8/Service/IService.cs example: https://github.com/cosh/fallen-8/blob/master/Fallen-8/Service/REST/AdminServicePlugin.cs --> https://github.com/cosh/fallen-8/blob/master/Fallen-8/Service/REST/AdminService.cs

Contact me if you want to know how to satisfy a customer with F8 :)

Cheers, Henning.

cosh avatar May 03 '13 21:05 cosh

@cosh

For the query language, I would like it to satisfy everyones requirements. We discussed this topic a bit earlier in the first comments of this issue. From that discussion we can see that everybody would like to use a fluent API with both C# and F#. This makes a lot of sense, a lot of .NET developers like these fluent apis, and it could be the foundation of the query language.

Beside, I think that a scripting engine would add value here. I've had a lot of commercial success stories with IronPython. It is an easy language to learn and is well documented. Other scripting languages exist too. Simply put, I think those languages could simply call the fluent API. Sending scripts over the network to change business rules on the fly and execute requests on a service is one of my needs.

Both neo4j and TinkerPop offer this feature through cypher and Gremlin.

Looking at Fallen-8 architecture, I think this would fit as a plugin, right? What do you think about that. Would it be easier for users? I have no experience with Irony. Would it better fit here?

Loupi avatar May 03 '13 22:05 Loupi

On the blueprints-fallen-8-graph: after having looked at fallen-8 source, I realised that it would not require much efforts to integrate it with Frontenac. What I need to know is it's supported features. If you could give me a list of bools for the properties of this class this would be fantastic. https://github.com/Loupi/Frontenac/blob/master/Blueprints/blueprints-core/Features.cs

As I understand, I will need to host fallen-8 into the Blueprints graph like that: https://github.com/cosh/fallen-8/blob/master/Startup/Startup.cs

I'm sure more questions will come later.

Loupi avatar May 03 '13 23:05 Loupi

On fallen-8: like Michael says, I think that fallen-8 would be a solid foundation for our upcoming work. It is clean, performant, the codebase is not astronomic/unmanageable, and it was made by someone who worked on one of the few serious graph db in .NET. I like it's plugin system, and looking at the links you provided, I noticed that we can even upload new plugins using the AdminService. Wow!

I'm curious about those savegame files. What is the best period for the ETL job in a HA system? What disk capacity and techology is best for it. Would backing up every ten minutes be ok? Could only the differences between 2 backup sets be written to disk, and woud it be woth it too?

Loupi avatar May 03 '13 23:05 Loupi

Louis,

i do see the benefit of a fluent api, like what i built for gremlin in my api. however what i built for cypher uses lambda expressions and i parse the expression tree and convert the expression into cypher syntax. I, like you, would like to support both and multiple scripting languages, don't forget you can host asp as well. but i think the biggest benefit would come from a cypher like language but using LINQ syntax. this would give a very .net centric query language that c# developers would find natural. i don't know if you have used entity framework, but it does have some very interesting functionality, regardless if you like using it or not. imagine you have a cypher query but your nodes and relationships are strongly typed and you are able to use the power of LINQ aggregate and other commends. you could also use data annotations that would provide context help for what relationships point to what nodes etc. your returned data would be loaded into the correct data classes and you can just update the class and than update the graph. this would make data manipulation very easy and clean with very little overhead because .net property implements anonymous types, one of the great things about .net. if using f# it get even more streamlined. but in order for this to work the graph data model needs to be in a form that works well with the traversal algorithm. this is something henning would be a great asset to give pointers for. I am hoping that this weekend i can find enough time to really look deep into f8 and come up with a high level design/implementation.

what are your thoughts on this? thanks michael

On May 3, 2013, at 6:47 PM, Louis-Pierre Beaumont [email protected] wrote:

@cosh

For the query language, I would like it to satisfy everyones requirements. We discussed this topic a bit earlier in the first comments of this issue. From that discussion we can see that everybody would like to use a fluent API with both C# and F#. This makes a lot of sense, a lot of .NET developers like these fluent apis, and it could be the foundation of the query language.

Beside, I think that a scripting engine would add value here. I've had a lot of commercial success stories with IronPython. It is an easy language to learn and is well documented. Other scripting languages exist too. Simply put, I think those languages could simply call the fluent API. Sending scripts over the network to change business rules on the fly and execute requests on a service is one of my needs.

Both neo4j and TinkerPop offer this feature through cypher and Gremlin.

Looking at Fallen-8 architecture, I think this would fit as a plugin, right? What do you think about that. Would it be easier for users? I have no experience with Irony. Would it better fit here?

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 03 '13 23:05 SepiaGroup

Michael, we come from the same world. I've been using both LINQ to SQL and EF in commercial solutions since their CTP. I also played with custom code generation with T4 templates reading edmx files. This is very powerfull. Custom providers can also be implemented.

Looking at the TinkerPop stack, they have the Java equivalent of what you are describing here: Frames. It is the graph ORM of the TinkerPop stack. They use attributes to annotate the objects, and these attributes can even contain gremlin queries to fill collections.

I also think that a Cypher like DSL is the way to go. I'm sure we'll find a lot of pointers inside sones GQL source files too.

Loupi avatar May 04 '13 00:05 Loupi

Hi guys! I'll comment tomorrow.

cosh avatar May 05 '13 21:05 cosh

henning,

i have spent some time reviewing F8 and have a few questions - also learned a few things too…

i have been reviewing the classes in the Model folder. these classes are the core elements of the graph db. If i understand it correctly, you then use BigList class to contain all of the graph elements that are read/write to disk. BigList has a method GetElement that will search all elements to find the ID of the element sought. after you find the element you are looking to get, i assume you then use some sort of traversal technique to get the other elements you want.

The EdgeModel class has references to source/target vertexes while the VertexModel has references to all the incoming and outgoing edges. the in/out reference are List of EdgeContainers which organize the edges into types of edges by using the EdgePropertyId.

so if have the above correct my questions are:

why are you using arrays to hold the elements in BigArray? i see you are using a two diminutional array and shard the data but couldn't you use a concurrent collection? using a concurrent collection would reduce the need for all the locking code. also if using a concurrent dictionary you could reduce some of the code used in searching for an element by id and i think you could still come up with some form of sharing. array of concurrent collection. also i know that concurrent collections are slow but when there are a lot of thread reading/writing they do preform well.

i have the same question for how the references to edges within the vertex class are stored, could a concurrent bag or dictionary be used instead of a List?

i did not find where you do your traversals, can you point me to where i should look.

thanks for the help michael

On May 5, 2013, at 5:02 PM, Henning Rauch [email protected] wrote:

Hi guys! I'll comment tomorrow.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 09 '13 03:05 SepiaGroup

Hi Michael,

What a nice review :). BigList... I used concurrent data structures but they had two drawbacks: too big, too slow. I had to optimize a lot for memory usage. And concurrent bag is the slowest thing on earth if you ask me :). Of course I did a lot of benchmarks but in the end it did not work out. For concurrency I use the class https://github.com/cosh/fallen-8/blob/master/Fallen-8/Helper/AThreadSafeElement.cs which implements a SpinLock. This enables F8 to have multiple concurrent reads and only one write at a time.

List<EdgeContainer>... I made an assumption that the number of https://github.com/cosh/fallen-8/blob/master/Fallen-8/Model/EdgeContainer.cs would not be that big and again: I had to optimize for size. The Dictionary<EdgeModel> would consume too much memory

Concerning traversals... First things first:

  1. Get the starting vertex by an secondary index lookup or via ID or via GraphScan
  2. Traverse manually by calling TryGetOutEdge (https://github.com/cosh/fallen-8/blob/master/Fallen-8/Model/VertexModel.cs#L508 ) or TryGetInEdge (https://github.com/cosh/fallen-8/blob/master/Fallen-8/Model/VertexModel.cs#L543 ) 2.1 There you have to name the EdgePropertyID you are interested in 2.2 The edge property is identified by an UInt16 (again: because of size and because I hate Strings). This UInt16 can be seen as the name of the edge (like "Friends" or "Enemies") 2.3 If there has been an EdgeProperty with the interesting id you get back an true and as an out-param the ReadOnlyCollection<EdgeModel> (example: https://github.com/cosh/Fallen-8-Intro/blob/master/Fallen-8%20Intro/IntroProvider.cs#L116 ) 2.4 If you used an incoming edge you should proceed with the SourceVertex in the EdgeModel otherwise go with the TargetVertex
  3. start again with 2

Usually I hide this complexity behind a service that is dedicated to exactly one task.

Additionally you could use the https://github.com/cosh/fallen-8/blob/master/Fallen-8/Fallen8.cs#L577 to calculate all shortest paths between two vertices. Therefor you need an ShortestPathPlugin (which I usually sell). Example: https://github.com/cosh/fallen-8/blob/master/Fallen-8/Algorithms/Path/BidirectionalLevelSynchronousSSSP.cs

I hope I could answer some of your questions.

Cheers, Henning.

cosh avatar May 10 '13 04:05 cosh

@Loupi You can try out the Fallen8 intro if you want to get an impression of the traversal speed and the checkpointing-mechanism. Please have a look at this: https://github.com/cosh/Fallen-8-Intro

After you are finished with the benchmarks you could use the Admin-Service to execute as many Checkpoints as you want. All GraphElements and all secondary indices will be saved at once in multiple threads. So: the more CPU and fast disks you got, the faster this will be. In this F8 intro it usually saves about 2.000.000 Edges or Vertices per second.

cosh avatar May 10 '13 04:05 cosh

@Loupi concerning query language... I'm all for a query language. But I did not want to have it in the core. As you know I built big parts of the sones GraphQL and my experience is that this takes really a lot of time to implement. At the time I built F8 I did not had the time to do this.

cosh avatar May 10 '13 04:05 cosh

Hi guys, sorry for my lack of presence these latest days, my laptop/dev machine went into a brick! :( http://forums.lenovo.com/t5/IdeaPad-Y-U-V-Z-and-P-series/y580-Black-Screen-of-Death/td-p/798003

All my projects are on hold and it sucks! I'm in the process of switching back to a temporary computer.

@cosh Thank you for this link on the fallen8 intro. I'm going to benchmark it with one of my production datasets. I'm under the impression that, in it's current form, fallen-8 would be a better fit than SQL server for my business cases.

Loupi avatar May 14 '13 14:05 Loupi

sorry to hear about your computer - bummer!

Henning,

I have been discussing F8 with a friend of mine and some issues around concurrency. he has suggest a look at Multiversion Concurrency Control method. I have limited knowledge of this method but find it very interesting and workable. do you have any insight into this ?

thanks michael

On May 14, 2013, at 10:00 AM, Louis-Pierre Beaumont [email protected] wrote:

Hi guys, sorry for my lack of presence these latest days, my laptop/dev machine went into a brick! :( http://forums.lenovo.com/t5/IdeaPad-Y-U-V-Z-and-P-series/y580-Black-Screen-of-Death/td-p/798003

All my projects are on hold and it sucks! I'm in the process of switching back to a temporary computer.

@cosh Thank you for this link on the fallen8 intro. I'm going to benchmark it with one of my production datasets. I'm under the impression that, in it's current form, fallen-8 would be a better fit than SQL server for my business cases.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 14 '13 16:05 SepiaGroup

@SepiaGroup a friend of mine once implemented MVCC for a part of a graph database. The overall conclusion was that it works quite well but the memory usage was insane... That's the first reason why I would not implement it. The second one is the focus/scope of F8. It sacrifices isolation in favor of performance. That's why it's currently capable of doing only one write-action at a time (due to the spin-lock I created). For my use-cases (80% read, 10% insert, 10% update) this behavior was ok. If you use F8 for a different scenario where you constantly insert/update/remove more that you read an MVCC implementation would be valid (I like the optimistic approach).

Cheers, Henning.

cosh avatar May 14 '13 17:05 cosh

@Loupi I like. If you need a jumpstart, ask me.

cosh avatar May 14 '13 17:05 cosh

i am looking for something more generic then what f8 was designed for, more along sones/neo4j. using a mvcc would satisfy isolation nicely. Duplication of the objects may not be that bad if only the object involved in the transactions are duplicated. this would mean that nodes/edges would need to be using something like a linked list or some other structure that would not require the entire collection to be locked for adding/deleting elements.

just curious, what did you use at sones? do you have a better idea?

thanks michael

On May 14, 2013, at 1:19 PM, Henning Rauch [email protected] wrote:

@SepiaGroup a friend of mine once implemented MVCC for a part of a graph database. The overall conclusion was that it works quite well but the memory usage was insane... That's the first reason why I would not implement it. The second one is the focus/scope of F8. It sacrifices isolation in favor of performance. That's why it's currently capable of doing only one write-action at a time (due to the spin-lock I created). For my use-cases (80% read, 10% insert, 10% update) this behavior was ok. If you use F8 for a different scenario where you constantly insert/update/remove more that you read an MVCC implementation would be valid (I like the optimistic approach).

Cheers, Henning.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar May 14 '13 17:05 SepiaGroup

Louis,

i created a simple C# STM that i think you may like. this is my first crack at it so if you find a bug let me know. https://github.com/SepiaGroup/STMNet

michael

SepiaGroup avatar Jul 05 '13 17:07 SepiaGroup

Hi everyone!

I work for a Norwegian software company developing a .Net based CMS. One of our main advantages over our competitors is that we have a very flexible data model, with full support for relations etc. Basically, we can store graph data. But the graph data is stored in a SQL database, so it's not optimal for a number of queries that a graph database excels at.

We've been looking for a .Net graph database for almost two years now, but as you know, there are limited options. We've talked to Cloudgraph, which was supposed to be out in Q1 2012 with a beta, but nothing seems to have happened with it for a long time. We also worked briefly with BrightstarDB to create a graph API on top of their triple store, but that didn't work out. We looked at Fallen8, but we don't want an in-memory store.

We then looked at writing our own graph database. Before we abandoned the idea, the plan was to use either Esent or a native .Net key/value store (RaptorDB, RazorDB, STSDB etc), and create a graph API on top.

A few months ago we started to work together with VelocityDB to help define a graph API on top of VelocityDB. They have worked very hard, and we're currently testing an early version. Frontenac will be supported in the near future. I have also asked Mats, the founder of VelocityDB to take a look at this thread.

I think Frontenac/Blueprints is great, but I also think it is crucial that we get a standardized graph query API for .Net. I see some of you have talked briefly about it above, but have any of you any concrete plans? Or should all of us try to cooperate on something?

Vidar

Radiv3 avatar Jul 14 '13 21:07 Radiv3

Vidar,

I have been working with neo4j for some time now and always felt that a similar db in .net would be very useful. The db I would like to develop would have some key functionality that neo and most other db’s don’t provide (why build something that already exists). I would like to see a db that can handle concurrent updates and notify you if the update cannot take place because the data you were working with has been modified. If your transaction is aborted, you then inspect the reason why this happened and decide if you would like to retry the entire transaction. Trying to do this in neo is very difficult if not impossible with the current http interface. So to solve this issue I have developed a .net c# STM. When a transaction is aborted using the stm you will receive a Conflicts collection that you can run through and decide if you would like to retry the entire transaction. I am still developing this but it does work and seems to be performing very well. The stm uses no locks so it will never dead lock but is still atomic. Once this is complete I will be developing a graph db using stm as its transaction core. There are some object and index size restrictions within .Net that making a large in-memory db a bit challenging, but I think I have a solved these issues and will be testing it later this week.

I know that this does not answer your question or solve your immediate need for a db, but I would be interested in working on a standard query api. I do like cypher a lot but I think in .net you can get a lot more out of it using LINQ.

Let me know what you think

Michael

SepiaGroup avatar Jul 15 '13 13:07 SepiaGroup

STM seems like an interesting concept, but not a big concern in our use-case. We'll be doing mostly reads, so our main focus is max traversal speed and scalability.

Glad to hear you're up for working on a standard query API. Anyone else?

Radiv3 avatar Jul 17 '13 09:07 Radiv3

Thanks Loupi for creating Frontenac Blueprints. I have been using it now for a few weeks. It is a good idea creating a standard interface for graph databases with .NET. I got most of the API implemented now with VelocityGraph. I am looking at the tests now. Once we have VelocityGraph as a NuGet extension, we can add VelocityGraph tests to the Blueprints tests if you don't mind? See https://github.com/VelocityDB

For query of graphs using .NET, it would be nice if we could integrate it with LINQ. I see that there is a default query mechanism with your code, I have not tested it yet but it could be a start.

VelocityDB avatar Jul 21 '13 01:07 VelocityDB

Hi guys, it's been a while. My laptop is now ok, and I'm at the end of my summer vacations! :)

I have been pursuing my development of an ESENT graph implementation, not so intensively too (call of the bass guitar), I've got the base to setup tables with indices and so on, create vertices and edges. I've also managed to design something that will be queryable, powered by ESENT under the hood. I'm still planning to fully implement IIndexableGraph, IKeyIndexableGraph and ITransactionalGraph.

Michael, I've had a look at your STM. It's nice, and in line with what you mentionned earlier in the post. I like the panacea of options that are possible with your Transaction object. Very interesting, I'll try to fit it in with my stuff, I see good use for it.

It's great to to see you again Radiv, and nice to meet you Mats. I would be honored to host the Velocity tests. I'll add some info pointing to VelocityDB on Fontenacs main page too.

About the query language: The query interfaces and comparators of Frontenac (IQuery, IGraphQuery, IVertexQuery, Compare) are simple wrappers that can be used around the API of specific graph databases. You can have a look at QueryTestSuite.cs to see how it is meant to be used. A memory implementation for Tinkergraph is in DefaultGraphQuery.cs.

The Tinkerpop query language, Gremlin, uses Pipes(a bit like linq) and Groovy (like IronPython) to perform it's job. From the discussion we had about this, it would be better for Frontenac to use LINQ instead of porting Pipes, and have a query language à la Cypher, fluent API, etc. No progress has been made in this area yet.

Loupi avatar Jul 21 '13 02:07 Loupi

Hello,

A big step forward was achieved today: the first commit, with tests, for Grave, a graph database engine backed by ESENT and Lucene.NET.

GraveGraph uses Managed ESENT to store it's data. Currently, a database with 3 tables is created: A configuration table (index names/types are kept here) A vertex table An edge table

The vertex table has an auto id column. Other columns are created on the fly when necessary. For property data, a text column is created with the property name. For edge relation data, an in or out MultiValue column is created with the label name and direction. Edge ids will be stored here for each vertex (index free adjacency).

The edge table has an auto id, an in vertex id, an out vertex id and a label. Other columns can be added on the fly like vertices.

When properties are added to a vertex or edge, they are serialized/deserialized with JSON.NET. This enables the serialization/deserialization of any objects easily (int, short, YourCustomObject). Other custom serializers can be used.

GraveGraph implements both IKeyIndexableGraph and IIndexableGraph. It uses Lucene.NET under the hood, with NrtManager (near real time search) to achieve optimal indexing performance under multithreaded scenarios. A concept of generations is used to keep the index fast: only the writer sees his modifications immediately. Searchers are distributed to different threads using a mechanism of tokens.

I also created an object analyzer based on ObjectDumper but using Fasterflect. This analyzer inspect the objects that are inserted into the graph and automatically indexes them into Lucene. You can also easily provide custom object indexers for specific types.

GraveGraph uses cursors and lazy loading to iterate over it's tables and search results.

Castle Windsor was used under the hood to create a GraveFactory to instantiate graphs and easily manage their dependencies.

What's next:

Implement IQuery, IGraphQuery, IVertexQuery, then create a LINQ Graph Query Language! More tests will be performed on Grave with big data sets. Add a strategy to pick different data models for storing edges per vertex depending on their numbers. Add Transactions support. Add more helper methods to GraveFactory.

Loupi avatar Aug 25 '13 21:08 Loupi

Louis,

I am just getting around to review your new work. If i understand it correctly you are storing and traversing all data within the data store, ie. not using any memory structures for the graph. is this correct or did i miss something? i am curious about the performance of this since everything would have to go through the jet db. on the upside, you don't have to worry about the size of the db. like i said i just started looking into your code. also can you please check your commit i think you may have missed a few libs.

michael

SepiaGroup avatar Aug 29 '13 16:08 SepiaGroup

Hello Michael,

Long time no see! :)

You are right, I'm not using any memory structure (yet) and I traverse the data store using JET API. I'm also curious about the performances, and that's why I'm going to create a better benchmark, with a big data set, in the coming weeks. In theory, I could go up to 16TB of data with ESENT.

Beside, JET can be configured to perform some caching, but I still have to experiment with this. A custom caching layer could also be added in the future.

I will have a look at the commit tonight to see what's missing.

Loupi avatar Aug 29 '13 19:08 Loupi

Louis,

i have been trying to find time to complete the work i started on with the graph structure in memory. my approach is a bit different than yours since i am using my stm to handle the transactions. in .net there are some limits on array sizes and large memory usage, so i was thinking of using a modified version of what Henning did (sharding array). this will work well but does require more coding time then i have available at the movement, so i am using a simple dictionary for a proof of concept. in the dictionary each element will be a stm object containing a Node or Relationship. (for example nodes = Dictionary<int, Stm<Node>> where int is the id of the node). this will allow me to have transactions with no locking and should be able to handle highly concurrent operations with few bottle necks. there is also another dictionary to hold the relationships. i am going to look more into how you implemented nodes and relationships and see if i can use your node/relationship class. with this approach i was thinking of placing each committed transaction on a stack and then have a process write it to disk. not sure if i can fit this into your code at the moment. any thoughts on this?

On Aug 29, 2013, at 3:10 PM, Louis-Pierre Beaumont [email protected] wrote:

Hello Michael,

Long time no see! :)

You are right, I'm not using any memory structure (yet) and I traverse the data store using JET API. I'm also curious about the performances, and that's why I'm going to create a better benchmark, with a big data set, in the coming weeks. In theory, I could go up to 16TB of data with ESENT.

Beside, JET can be configured to perform some caching, but I still have to experiment with this. A custom caching layer could also be added in the future.

I will have a look at the commit tonight to see what's missing.

— Reply to this email directly or view it on GitHub.

SepiaGroup avatar Aug 29 '13 20:08 SepiaGroup

Michael,

I made a basic transacted graph class using ESENT, which supports 7 levels of transactions. If I understood ESENT docs correctly, it supports multi versioning, "the only updates a transaction encounters are those made by it". When a conflict occurs, the transaction can retry or cancel it's work (per level).

Because of multi-versioning, transactions must be short lived and commit frequently otherwise memory will grow.

I think my class would work with this out of the box, the only thing that I did not implement yet is a kind of stack for the indexing service. Lucene is not like ESENT, and there is only a single writer and no real transaction support. With multi-level transactions, I need something to store the stuff that needs to be indexed on commit. Also, I'm still wondering how to do it. What if, at a certain transaction level, an index search is performed depending on an already commited upper level object? Maybe I'll commit to Lucene at level 0 only to keep things simpe, but with a limitation.

If you want to reuse Grave, there are different approaches:

  1. Inherit from WrappedGraph, WrappedVertex and WrappedEdge to wrap a Blueprint Graph. This way, you can wrap Grave or any other Blueprint graph and perform operations on the STM where you need to. I think this would be the easiest approach. You can simply copy paste the files in blueprints-core\Utils\Wrapper\Wrapped and adapt to your needs.

  2. You can also inherit directly from GraveGraph, GraveEdge, GraveVertex. You'll be bound toGrave only too.

I have to go, cya later

Loupi avatar Aug 29 '13 23:08 Loupi

Hello,

Just to let you know that I released Blueprints 2.3.8 today.

It includes the Gremlinq language (not complete too). It is based on both Gremlin and Linq to Objects. Proxy functionalities are also available to map vertex and edges to business entities in Gremlinq queries. Documentation is available here.

The available operators are In, InE, Out, OutE, Both, Loop, As, Proxy, Wrap, OfType, AddVertex, and AddEdge.

I'm going to add more operators in the coming weeks, based on Gremlindocs.

Oh, and Gremlinq works with F# too.

Loupi avatar Mar 22 '14 18:03 Loupi

Great work Loupi!

In the Wiki you mention that you can't access the relation properties directly yet. Do you have an estimate for when that will be ready?

Radiv3 avatar Mar 22 '14 23:03 Radiv3

Hello Radiv3, nice to see you again.

I wanted first to see if people like or not this idea of using IEnumerable, KeyValuePair and interfaces to model the relation.

I did not want to write my own proxy library for Gremlinq, but rather abstract this fonctionality so that it can be customised. For this purpose, I created IProxyFactory, and implemented a default version based on Castle DictionaryAdapter from Castle.Core Nuget package.

Castle DictionaryAdapter is not very well documented, but from what I understood of it, it should be possible to detect the property type and call a custom function to retrieve it's value. I'd like to know if people like Castle DictionaryAdapter or not too, because the following work highly depends on it.

Additional information than the label and types is required to perform an edge query. A convention based on the property name could be used to specify the direction, like Childs, InChilds, InEChilds, OutChilds, OutEChilds, BothChilds, BothEChilds.

Another solution would be to use c# attributes to decorate relation properties with their direction. The label could also be overriden this way.

Personally, I would prefer a convention based on the property name. I tried not to use attributes when creating the Proxy features. I'd like your input on that too.

When we're all set on this, I dont think that it would require that much work (providing Castle Dictionary can do it). If I'm optimist and everything goes well, maybe less than 1 week.

Oh, and I forgot to mention that Frontenac.Blueprints is available on Nuget!

Loupi avatar Mar 23 '14 01:03 Loupi

Our main concern is that it must be relatively easy to support for people implementing Blueprints/Frontenac. We're dependent on VelocityDB/VelocityGraph being able to support the solution you choose, and I think it is important for the overall adoption of Gremlinq.

Having said that, we've had a graph data model in Webnodes for many years (we're adding a graph database in order to support queries that are better suited to traversal, but the data model will be more or less the same) that has worked very well for us. I don't think it's very suited for for Gremlinq, but it might give you some inspiration.

We have a relation property object(NodeRelation(s)Property), that contains a number of methods. For example, we have a parent/children relation that is used for the traditional CMS content tree. By having an object between the node and the related data, we can add several methods to it. Note that we have no concept of edge objects.

node.Children.Get<IContent>() <- Returns a list of the child content nodes node.Children.Contains(int nodekey) node.Children.IsEmpty() node.Children.GetIds() <- Returns a list of node ids node.Children.Add() node.Children.Remove() node.Children.Count() node.Children.Query<T>() <- Filter the related nodes.

If we change the Gremlinq example to be a bit inspired by our API, I would guess it would look something like this:

public interface IPerson
{
    string Name { get; set; }
    int Age { get; set; }

    RelationProperty<IPerson> Father { get; set; }
    RelationProperty<IPerson> Mother { get; set; }

    RelationsProperty<IChildren, IPerson> Children { get; set; }

    RelationsProperty<IJob> Jobs { get; set; }
}

IPerson person = someobj.GetPerson(); IPerson father = person.Father.Get(); IEnumerable<IJob> jobs = person.Jobs.Get(); //or GetBoth if(person.Jobs.Count() > 2){ //person.Jobs.CountIn(), person.Jobs.CountBoth() etc //do something } IEnumerable<IJob> jobs = person.Jobs.GetIn(); IEnumerable<IChildren> childEdges = person.Children.GetEdges(); //GetBothEdges(); IEnumerable<IChildren> childEdges = person.Children.GetOutEdges(); IEnumerable<KeyValuePair<IChildren, IPerson>> keyValuePairs = person.Children.GetEdgesAndVertices();

Radiv3 avatar Mar 25 '14 14:03 Radiv3

Hello,

Blueprints 2.3.9 has just been released. It now supports relations in proxy objects. You can see an example here

On your main concern, I have good news. I've been working hard to provide solid defaults in Gremlinq. Blueprints compliant graph databases should all work out of the box with Gremlinq. The new relations feature has been tested with both TinkerGraph and Grave graph databases,

Loupi avatar Mar 29 '14 22:03 Loupi

Hi Loupi,

I will update VelocityGraph with your latest release. Keep up the good work!

Cheers!

Mats

408 596 0973

Skype: matspca

From: Louis-Pierre Beaumont [mailto:[email protected]] Sent: Saturday, March 29, 2014 5:35 PM To: Loupi/Frontenac Cc: Mats Persson Subject: Re: [Frontenac] objective (#1)

Hello,

Blueprints 2.3.9 has just been released. It now supports relations in proxy objects. You can see an example here https://github.com/Loupi/Frontenac/wiki/ORM-Example

On your main concern, I have good news. I've been working hard to provide solid defaults in Gremlinq. Blueprints compliant graph databases should all work out of the box with Gremlinq. The new relations feature has been tested with both TinkerGraph and Grave graph databases,

— Reply to this email directly or view it on GitHub https://github.com/Loupi/Frontenac/issues/1#issuecomment-39011388 . https://github.com/notifications/beacon/1156765__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMTc1MTY4NSwiZGF0YSI6eyJpZCI6MTAyNzY0NDN9fQ==--c70498c6a884663d9fee244f802029d397b737dd.gif

VelocityDB avatar Mar 30 '14 00:03 VelocityDB

Hi guys, This discussion is really interesting, but is it dead now?

Ponant avatar Oct 21 '16 10:10 Ponant

Hello Ponant,

Its sleeping but not dead. I have plans to integrate the cypher language into Frontenac, following guidelines from OpenCypher project. Currently waiting for a reference implementation.

https://groups.google.com/forum/#!topic/opencypher/CdlBJAEOqXk

Loupi avatar Oct 21 '16 13:10 Loupi

Hi Loupi!

How is the OpenCypher integration going? Do you intend to do any more work on Frontenac?

vlangber avatar Jan 21 '19 22:01 vlangber

Hello Vidar!

Have you seen https://github.com/Loupi/node-cypher-parser ? I wrote a node js lib, with a C++ addon that uses libcypherparser and returns the AST in JSON. There are also corresponding typescript models for each Cypher keywords. I also ported Frontenac to typescript, but never published it.

There is a limitation, as libcypherparser can only be compiled to unix like systems. It uses PEG, a C parser generator, which generates incompatible C code for windows.

But three weeks ago, I found a Windows port of PEG. I grabbed libcypherparser code, created a VC++ project, regenerated the parser, and compiled a working windows binary.

Frontenac, both C# and typescript, are now at the same step: create execution plan library, based on cypher parser. As porting C# to typescript is easy, that I develop faster in C# and that now it would be very easy to call the C++ module from C#, Frontenac is going to get more attention.

bealoui1 avatar Jan 22 '19 00:01 bealoui1

I saw that project the other day. Very cool!

Looking forward to seeing a fully working example with Frontenac and OpenCypher. We have a project in advanced planning stages where we're considering using Frontenac. If it gets support for OpenCypher, that would be a big plus for us..

vlangber avatar Jan 22 '19 22:01 vlangber