eliasdb icon indicating copy to clipboard operation
eliasdb copied to clipboard

Why not BoltDB or something?

Open abourget opened this issue 9 years ago • 7 comments

Was there a need to have zero + zero dependencies?

I'm curious about the story behind this DB.. Why it was written and what sets it apart from Cayley and Dgraph?

abourget avatar Sep 25 '16 01:09 abourget

Hi there,

thanks for asking. EliasDB is actually a rewrite of a similar DB design which I previously did in Java. The Java thing started as a learning experience of what key-value stores do, how to build an abstract data model, how to model a query language, etc. As a result I now know the flow of a single byte from when it is received to where it is stored and all the design decisions along the way. When tweaking performance or other storage related things this is very handy - also there is no excuse for bugs :-) I also believe that the best pieces of software you write when you write it for the second time.

To answer your question if there was a need for zero dependencies: no, there was none. I am sure with a bit of refactoring you could easily use BoltDB as underlying key-value store. For this you could tweak the graph.Manager code or just create a special graphstorage.GraphStorage. However, having an own implementation for this gives me a lot of freedom for fine tuning. I know exactly where locking occurs and how transactions work. If there is a bug I can pick the whole thing apart if this is required :-)

Here a few thoughts on Cayley and Dgraph. Both datastores are really impressive pieces of work.

Dgraph

  • Dgraph is distributed while EliasDB is not - however I am planning to change that :-)
  • EliasDB has a REST API which I can't see with Dgraph. They do however have clients for Python and Java which EliasDB doesn't have.
  • Dgraph uses a GraphQL-like syntax while EliasDB in the moment uses an SQL-like syntax. (GraphQL looks good though and I think also EliasDB could use it)
  • Dgraph seems to be only available for Linux and Mac. EliasDB runs on Linux and Windows (haven't tested on Mac but I don't see why it shouldn't run there)
  • One thing which is a bit concerning is the relatively low test coverage of Dgraph (at the time of writing this: 55%). Maybe I am just overly paranoid but I do think you need much more if you write a datastore. The types of bugs/regressions you can involuntarily introduce in such a complex and sensitive system can be very hard to track down.
  • Dgraph is backed by venture capital while EliasDB is a home-grown piece of work. Both environments result in different approaches to software development with both advantages and disadvantages. What I am saying is that you build things differently when money and pressure is involved.

Caylay

  • With an initial release in June 2014, Caylay is much more mature than EliasDB
  • Caylay has a build-in query editor and a visualizer. EliasDB doesn't have this (yet)
  • Caylay is a quite an ambitious project. It supports multiple query languages and multiple backends. The focus is quite wide here which in turn means you can't implement everything 100%. There is bound to be "work-in-progress" code and quite a few bugs. Supporting multiple backends gives users a choice but also means concessions in terms of coding. Storing highly connected graph data in a SQL database is a misfeature in my opinion. My point is that if you try to support too much, your development effort gets quickly "bogged down" and your architectural options are limited. EliasDB aims to be lean and focused - there is only one datastore and (at the moment) only one query language.

EliasDB

EliasDB's intention is to give you a small, really really easy-to-use datastores solution. At the moment it is a single executable. If you are, for example, a web developer you can just build a small web application demo and give it to your customers on a USB stick. You click on it and it "just works". That is, of course, not to say that you can only use it for toy applications. As with other datastores EliasDB can store large amounts of data. There are a lot of features I would like to explore and eventually implement. However, my premise for EliasDB remains: "keep it as simple as possible".

krotik avatar Sep 25 '16 14:09 krotik

Wow great answer :-) let's leave this issue open for a small while so others can read it too.

Thanks!

abourget avatar Sep 25 '16 18:09 abourget

Hi,

This looks really good. You refer turning it distributed. For read intensive usage what would be the most straigh-forward way to have a setup with some redundancy?

The top priority would be making sure that there is no data loss, so some kind of replication of the data store must be involved.

Just come to my mind that etcd could be used for master/slave election and as a way to replicate data over different instances. It might be somewhat slow on writes but as long as stale reads are allowed it shouldn't matter too much. The good thing about it would be keeping EliasDB simple. Thoughts on that?

jracabado avatar Sep 26 '16 22:09 jracabado

Hey,

etcd with its raft implementation looks good indeed. A very simple way to get EliasDB distributed would be to use etcd's key-value storage to model a distributed graph storage with it. You would need an object which satisfies the graphstorage.Storage interface. I could imagine that read performance would be quite reasonable.

However, I had something more "low-level" in mind for distributing EliasDB. I would imagine a "wrapper" object for a graphstorage.Storage implementation. This wrapper would use the wrapped storage for local storage and would add the distribution functionality. This means your could have normal disk storage or a cluster with memory-only storage which would be REALLY fast. I think the Dynamo paper from Amazon is a good starting point for research. I imagine:

  • A peer-to-peer approach without a coordinator.
  • Configurable replication factor. With replication factor n there must be n copies of a datum in the cluster. Up to n-1 cluster members can suffer permanent hardware failure without data loss in the cluster.
  • Eventual "Read-Your-Writes" consistency. (No read quorums or distributed transactions. I would need some kind of housekeeping thread)
  • Adding/removing peers is a manual process - i.e. the user needs to confirm that a member is lost permanently. Only this will trigger datastore rebalancing. This is all still very much in the planning and research stage but I am really interested in the details of database distribution.

krotik avatar Sep 27 '16 18:09 krotik

Alright now I've got to give this a shot.

Nice backstory!

faddat avatar Oct 01 '16 07:10 faddat

Check buntdb . It's hot a nice raft backend to replicate. Also it runs in memory with file backing. This is great feature for speed.

ghost avatar Dec 11 '16 17:12 ghost

https://github.com/tidwall/buntdb

ghost avatar Dec 11 '16 17:12 ghost