sophia
sophia copied to clipboard
Roadmap
What is the roadmap for sophia?
Thanks for the interest! :) I truly believe in work which is really personal, where you can put your soul and passion into, with no excuses.
Next release is going to be very important in terms of features and their impact on overall evolution. So, here is the roadmap:
sophia v1.2 (late november - december) (btm ~70% done)
- pure mvcc implementation (storage engine is version-aware)
- old versions are automatically cleaned up, due to merger design (in comparison to PostgreSQL, etc)
- real single-statement and multi-statement SERIALIZABLE transactions
- consistent cursors (you can consistently iterate over a database and do updates)
- cursors do not stop any merging processes
- completely multi-thread (everything, including cursors and any parallel operations)
- hot-backup (allow to take a database backups without a stop of database processes)
- index snapshotting (this will greatly improve boot time)
- 100% test coverage, quality and reliability (i believe that your information is much more import that any very best database, there should be no excuses, add coveralls link to github)
- amalgamation builds (sqlite style, single .c file which can be shipped with your project)
features are going to be implemented after the release:
- multi-thread merger (this should greatly improve merger performance)
- two-phase commit (to be able to use as a storage engine in distributed systems)
- in-memory only mode (fast in-memory database mode)
- incremental backups (rsync alike, maybe using librsync or just allow a user use it's own scheme and be sure that data are consistent)
sophia v1.3
- secondary indexes
- multi-process access protocol, replication (probably networked access)
- compression
Hope you going to like it ;)
I love all of it! Let me know how I can help in any way :)
@pmwkaa I'd like to keep sphia(1)
in line with the new features and changes to the code base. What you've mentioned for the roadmap could make this tool really useful as far as replication, backups, restoration, etc.
@stephenmathieson care to join in on the fun? :)
hmm.. i haven't played with sophia, but i'm certainly interested
Join the party! Would love some help with github.com/jwerle/sphia
@pmwkaa
sophia v1.3
secondary indexes
I am a little bit confused when seeing this. Does it mean that sophia
supports for complex data structure (not string or number) such hash-like ones or nested data structure like JSON object?
sophia v1.2 (late november - december) (btm ~70% done)
completely multi-thread (everything, including cursors and any parallel operations)
What do you mean by "parallel operations"?
Thanks
@jwerle i think that is a great idea) i will keep you informed about new updates and features, specifications or anything in v1.2:) Thanks!)
@pcdinh you already can store any document object like json, the only thing you need is to use your own custom compare functions which will retrive and compare you keys within a document and do according comparison.
Speaking about secondary index'es, there is no such support yet. Right now it is plain key-value database. But i think in a time i will add support for such functionality. I imaging it will be possible to maintain chained databases and do consistent updates on them in some optimized manner. It would be possible to separately query different index'es. Later, there would be support for online index creation, drop, etc. But there a still a long way in that direction, and that is not a priority right now.
By parallel operations, i mean there would be complete support for use in a user multi-thread environment, with a real mvcc transaction model. For eg., it would be possible to do consistent database traversal, doing updates in a same time and have a feel of real SERIALIZABLE isolation.
I think that the only thing that will change in v1.2 API for the user pointer of view, is that sp_begin() function will return transaction pointer. And that's is all ;)
For example:
void *db = sp_open(..) sp_set(db, key, value); # will do single-stmt transaction, semantic will not change
void *txn = sp_begin(db) sp_set(txn, key, value) # do multi-stmt transaction sp_set(txn, ...) sp_get(txn, ...) # will see changes made by current transaction or visible before it
sp_commit(txn) or sp_rollback(txn) or sp_destroy(txn)
sophia v1.2 (late november - december) (btm ~70% done) pure mvcc implementation (storage engine is version-aware)
Someone tell me that supporting mvcc will make the code base bloated, is that true?
sophia v1.3 multi-process access protocol, replication (probably networked access)
Don't do it! It is better to develop sophia storage engine for MySQL or MariaDB. This is a sample implementation of LevelDB https://mariadb.atlassian.net/browse/MDEV-3841
Someone tell me that supporting mvcc will make the code base bloated, is that true?
Yes, it's partly true. Introducing multi-version is a big task and mostly comparable to remake whole engine logic. But it's up to implementation anyway, i managed to make it as simple as possible and without visible performance degradation for now. lmdb for example have a very small multi-versional b-tree specific implementation.
Don't do it! It is better to develop sophia storage engine for MySQL or MariaDB.
Thanks! I will take a look on it :)
I think replication (w/ or w/o networking) isn't supposed to be in a storage engine. That's a higher level issue!
Hot-backup, though, is already a great option.
I think replication (w/ or w/o networking) isn't supposed to be in a storage engine. That's a higher level issue!
This is what I mean to be. If you create a storage engine in MySQL, then, replication, (not hot) backup, will be handled by MySQL.
A MySQL storage engine is a huge effort. It would be nice if there were a cleaner API (handler.h is huge and some behavior is obscure). The LevelDB storage engine that was cited above is a proof-of-concept, but some code from it could be reused here like the code for generating one byte array for a multi-part key. It would be nice if there were a chance for reuse between storage engines that have similar feature sets. But maybe the limited developer time is better spent making Sophia better and then integrating this into Tarantool.
"and then integrating this into Tarantool." That is the plan I think. :-)
@mdcallag
I mentioned MySQL here because MySQL don't have this features (write optimized storage engine). I have read about TokuDB storage engine, but although it is GPL'ed, but it is patented technology. So we can't use that engine in production server, right?
- Anyway, I'm not sure about mixed GPL+patent's consequences for production usage.
I am not a lawyer so I won't answer your question about use. TokuDB is distributed as open source and included in MariaDB and Percona/MySQL. My brother works at Tokutek and is happy to speak with potential users.
Hi,
Any idea what the planned release date for sophia v1.3 (or a v1.3 release candidate) is?
Hello,
Do you need a some particular feature, like secondary indexes?
It's been a while and i'm unfortunately can't tell any fixed date for sure right now. For the time left from last release, i've made a couple of new engine prototypes trying to improve sophia behavior on large data sets and memory management on high load. It took a lot of time, but i'm believe i'm on the right path right now.
sophia v1.2 development status: https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c
Yes, I am interested in secondary indexes. But, I was just curious. Lack of secondary indexes is not a showstopper for my project.
I'd rather a stable engine over new features so keep up the good work on your current track.
@pmwkaa any goodies coming soon ?
Any news on compression, secondary indexes or networking?
Yes! After several prototypes made, trying new ideas of internal design, i believe i found a good one to continue development with.
Work is going according to plan, and upcoming features are:
- multi-threaded merger and internal data sharding, sophia will use much less memory
- storage design made ready for secondary index'es support (real support scheduled for v1.3), do less io and group-commit ready
- support for multiple databases with on-line creation/drop support, databases share a single environment (thread pool, resources, etc.)
- MVCC implementation, automatic gc with merge (no external 'vacuum' needed)
- multi-stmt transactions (optimistic design, with less performance overhead) and consistent cursors
- hot-backup support
- engine implementation is completely rewritten, for a accurate testing and future project development
- everything is kept simple
I've start working on integrating sophia as a disk storage for Tarantool project lately: http://tarantool.org https://github.com/tarantool/tarantool
Since i'm now able to share more time on sophia integration and it's development (as part of tarantool team), i plan to make a release in July.
Thanks for the interest! :)
Sounds great! I'm looking forward to seeing these new features!
Development branch has been published: https://github.com/pmwkaa/sophia/tree/dev
https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c announce and intrigue :)
Woot ! Thanks for a detailed update On Jul 22, 2014 1:12 PM, "Dmitry Simonenko" [email protected] wrote:
Development branch has been published: https://github.com/pmwkaa/sophia/tree/dev
https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c announce and intrigue :)
— Reply to this email directly or view it on GitHub https://github.com/pmwkaa/sophia/issues/35#issuecomment-49769536.
@pmwkaa i'm getting really excited with all this dev work
Trying to make it worth for a long wait. Hope you guys like it :)
Lucene is another database you might be interested in. It is the major open source text search engine, and has a modular "codec" plugin design for the actual key-value storage engine.
There have been other projects to use Cassandra as the storage engine for Lucene. The native engine is coded in Java. Sophia might have advantages over it.
http://lucene.apache.org/
Thanks, i'll take a look :)