summitdb icon indicating copy to clipboard operation
summitdb copied to clipboard

Storage engines

Open pkieltyka opened this issue 7 years ago • 13 comments

Awesome project! any thought to having a pluggable storage engine? where default is memory. Having a boltdb option would be very nice.

pkieltyka avatar Oct 12 '16 16:10 pkieltyka

It's certainly possible. BoltDB would be a great alternative.

Right now SummitDB uses BuntDB as the database library. It's similar to Bolt but has secondary indexing and spatial indexing built in. Unfortunately Bolt does not have indexing at this time. So there would need to be some extra work around creating indexes using buckets. I'm assuming that Bolt and R-trees are not possible nor on the roadmap. But Summit could reimplement an R-tree structure specifically for Bolt.

Perhaps one option is to use BoltDB for the key space and BuntDB for the indexing.

Bolt and indexes R-tree library BuntDB

tidwall avatar Oct 12 '16 17:10 tidwall

@tidwall also, its confusing in the description that its an in-memory NoSQL database, yet the default is to have data persistence? I am happy that it has data persistence, but the description threw me off at first read

pkieltyka avatar Oct 12 '16 17:10 pkieltyka

Thanks for the feedback. I see how that might be confusing. Perhaps I should rephrase that to "In-memory database with disk persistence" or something along that line.

tidwall avatar Oct 12 '16 17:10 tidwall

@tidwall what does the in-memory part have to do with it..? that the working set is in memory? the entire db is in memory? or which..?

pkieltyka avatar Oct 12 '16 17:10 pkieltyka

The database is entirely in memory, it's the working dataset. Each writable command is appended to a file that is used to rebuild the database if the database needs to be restarted.

This is similar to Redis AOF persistence.

tidwall avatar Oct 12 '16 17:10 tidwall

@tidwall I see, if summitdb requires that the entire data set fit in memory, and the data in memory is the core working set, then I agree it is an in-memory database with disk persistence options. However then, I'd wonder why would someone choose this over Redis? just cuz of raft clustering with strong consistency? I personally feel a gap in database products is something like Redis (compatible) that supports data persistence engines for data sets that can grow to 100GB+, perhaps thats a different product like ledisdb or ssdb. The raft angle is pretty cool of course though.

pkieltyka avatar Oct 12 '16 19:10 pkieltyka

Under the hood SummitDB is quite a bit different from Redis. SummitDB is more suited as a NoSQL data store. In a way it's is more like a MongoDB.

I just open sourced the project yesterday, so it's too early to say if anyone will use it over Redis (that's not really my goal). The best I can tell you is why I wrote it and why I'm going to use it:

  • Raft. Strong consistency is something lacking from the Redis Master/Follower model right now. If Redis had this today, I probably would not have created summitdb.
  • Ordered key space. Getting a single key in Redis is super fast, but iteration, paging, and sorting on many keys can be a somewhat more challenging, sometimes requires multiple steps.
  • Secondary indexing. There's ways to kinda do indexing in Redis, but it requires using combinations of sorted sets mixed with other data types.
  • Spatial indexing. A built in R-tree structure can be super versatile and allow for multi-dimensional data like geospatial and statistics. This is not available in Redis.

I've been using Redis for years as a general purpose data store. Sometimes in combination with MongoDB (and recently Tile38 for geospatial). My desire is to merge what I see as common overlaps into a single platform.

I know that being an in-memory will not suit all people and I'm OK with that. I'm hopeful that those who use Redis as a primary data store might find value in a tool like SummitDB.

All that being said, I do like the idea of trying BoltDB as a disk-based storage option in the future.

tidwall avatar Oct 12 '16 20:10 tidwall

I have to agree @pkieltyka. I'm searching for a solution like Redis but with a disk-based storage, which supports data sets larger than the available RAM and which supports clustering without configuring complicated thirdparty proxies etc. So I absolutely like the idea of an additional disk-based storage engine for summitdb. This IMO fills the gap between Redis which is memory only and a full-fledged document database.

@tidwall many thanks for your work

railsmechanic avatar Oct 15 '16 07:10 railsmechanic

@railsmechanic Good feedback. While disk-based storage is not what I personally desire for my applications, there is a clear interest in the community that can't be ignored. Perhaps it may be as easy as dropping in BoltDB or perhaps the current BuntDB implementation can be modified to support offloading to disk. I plan on researching this topic further.

tidwall avatar Oct 15 '16 15:10 tidwall

I could see myself using it in tandem with redis. If it supported replication then I could retire the fork I have of redis, redis-interval-sets, and look at using the spacial index for IP to block mapping. I like it and will playing with it this weekend to do IP-to-subnet and IP-to-ASN mapping with R-trees

pedigree avatar Oct 16 '16 13:10 pedigree

Hi @pedigree. SummitDB currently supports State Machine Replication using the Raft Consensus Algorithm instead of Redis-style replication. I hope summitdb helps with your solution and thanks for your interest in the project.

tidwall avatar Oct 16 '16 16:10 tidwall

I have several read only geographic replicas configured for API nodes and they run a local copy of the redis database in order provide local access instead of HA. I love the project :)

pedigree avatar Oct 16 '16 17:10 pedigree

Hi. Just following up on this issue in 2020. Having a disk backed MongoDB alternative would be awesome. Thanks!

jjzazuet avatar Jan 09 '20 03:01 jjzazuet