sophia icon indicating copy to clipboard operation
sophia copied to clipboard

Roadmap

Open jwerle opened this issue 10 years ago • 29 comments

What is the roadmap for sophia?

jwerle avatar Oct 16 '13 11:10 jwerle

Thanks for the interest! :) I truly believe in work which is really personal, where you can put your soul and passion into, with no excuses.

Next release is going to be very important in terms of features and their impact on overall evolution. So, here is the roadmap:

sophia v1.2 (late november - december) (btm ~70% done)

  • pure mvcc implementation (storage engine is version-aware)
  • old versions are automatically cleaned up, due to merger design (in comparison to PostgreSQL, etc)
  • real single-statement and multi-statement SERIALIZABLE transactions
  • consistent cursors (you can consistently iterate over a database and do updates)
  • cursors do not stop any merging processes
  • completely multi-thread (everything, including cursors and any parallel operations)
  • hot-backup (allow to take a database backups without a stop of database processes)
  • index snapshotting (this will greatly improve boot time)
  • 100% test coverage, quality and reliability (i believe that your information is much more import that any very best database, there should be no excuses, add coveralls link to github)
  • amalgamation builds (sqlite style, single .c file which can be shipped with your project)

features are going to be implemented after the release:

  • multi-thread merger (this should greatly improve merger performance)
  • two-phase commit (to be able to use as a storage engine in distributed systems)
  • in-memory only mode (fast in-memory database mode)
  • incremental backups (rsync alike, maybe using librsync or just allow a user use it's own scheme and be sure that data are consistent)

sophia v1.3

  • secondary indexes
  • multi-process access protocol, replication (probably networked access)
  • compression

Hope you going to like it ;)

pmwkaa avatar Oct 16 '13 18:10 pmwkaa

I love all of it! Let me know how I can help in any way :)

jwerle avatar Oct 16 '13 18:10 jwerle

@pmwkaa I'd like to keep sphia(1) in line with the new features and changes to the code base. What you've mentioned for the roadmap could make this tool really useful as far as replication, backups, restoration, etc.

jwerle avatar Oct 17 '13 13:10 jwerle

@stephenmathieson care to join in on the fun? :)

jwerle avatar Oct 17 '13 18:10 jwerle

hmm.. i haven't played with sophia, but i'm certainly interested

stephenmathieson avatar Oct 17 '13 20:10 stephenmathieson

Join the party! Would love some help with github.com/jwerle/sphia

jwerle avatar Oct 17 '13 21:10 jwerle

@pmwkaa

sophia v1.3

    secondary indexes

I am a little bit confused when seeing this. Does it mean that sophia supports for complex data structure (not string or number) such hash-like ones or nested data structure like JSON object?

sophia v1.2 (late november - december) (btm ~70% done)
    completely multi-thread (everything, including cursors and any parallel operations)

What do you mean by "parallel operations"?

Thanks

pcdinh avatar Oct 18 '13 10:10 pcdinh

@jwerle i think that is a great idea) i will keep you informed about new updates and features, specifications or anything in v1.2:) Thanks!)

@pcdinh you already can store any document object like json, the only thing you need is to use your own custom compare functions which will retrive and compare you keys within a document and do according comparison.

Speaking about secondary index'es, there is no such support yet. Right now it is plain key-value database. But i think in a time i will add support for such functionality. I imaging it will be possible to maintain chained databases and do consistent updates on them in some optimized manner. It would be possible to separately query different index'es. Later, there would be support for online index creation, drop, etc. But there a still a long way in that direction, and that is not a priority right now.

By parallel operations, i mean there would be complete support for use in a user multi-thread environment, with a real mvcc transaction model. For eg., it would be possible to do consistent database traversal, doing updates in a same time and have a feel of real SERIALIZABLE isolation.

I think that the only thing that will change in v1.2 API for the user pointer of view, is that sp_begin() function will return transaction pointer. And that's is all ;)

For example:

void *db = sp_open(..) sp_set(db, key, value); # will do single-stmt transaction, semantic will not change

void *txn = sp_begin(db) sp_set(txn, key, value) # do multi-stmt transaction sp_set(txn, ...) sp_get(txn, ...) # will see changes made by current transaction or visible before it

sp_commit(txn) or sp_rollback(txn) or sp_destroy(txn)

pmwkaa avatar Oct 18 '13 11:10 pmwkaa

sophia v1.2 (late november - december) (btm ~70% done) pure mvcc implementation (storage engine is version-aware)

Someone tell me that supporting mvcc will make the code base bloated, is that true?

sophia v1.3 multi-process access protocol, replication (probably networked access)

Don't do it! It is better to develop sophia storage engine for MySQL or MariaDB. This is a sample implementation of LevelDB https://mariadb.atlassian.net/browse/MDEV-3841

awakmu avatar Nov 05 '13 09:11 awakmu

Someone tell me that supporting mvcc will make the code base bloated, is that true?

Yes, it's partly true. Introducing multi-version is a big task and mostly comparable to remake whole engine logic. But it's up to implementation anyway, i managed to make it as simple as possible and without visible performance degradation for now. lmdb for example have a very small multi-versional b-tree specific implementation.

Don't do it! It is better to develop sophia storage engine for MySQL or MariaDB.

Thanks! I will take a look on it :)

pmwkaa avatar Nov 05 '13 10:11 pmwkaa

I think replication (w/ or w/o networking) isn't supposed to be in a storage engine. That's a higher level issue!

Hot-backup, though, is already a great option.

19h avatar Nov 05 '13 14:11 19h

I think replication (w/ or w/o networking) isn't supposed to be in a storage engine. That's a higher level issue!

This is what I mean to be. If you create a storage engine in MySQL, then, replication, (not hot) backup, will be handled by MySQL.

awakmu avatar Nov 05 '13 14:11 awakmu

A MySQL storage engine is a huge effort. It would be nice if there were a cleaner API (handler.h is huge and some behavior is obscure). The LevelDB storage engine that was cited above is a proof-of-concept, but some code from it could be reused here like the code for generating one byte array for a multi-part key. It would be nice if there were a chance for reuse between storage engines that have similar feature sets. But maybe the limited developer time is better spent making Sophia better and then integrating this into Tarantool.

mdcallag avatar Nov 05 '13 14:11 mdcallag

"and then integrating this into Tarantool." That is the plan I think. :-)

dyu avatar Nov 05 '13 16:11 dyu

@mdcallag

I mentioned MySQL here because MySQL don't have this features (write optimized storage engine). I have read about TokuDB storage engine, but although it is GPL'ed, but it is patented technology. So we can't use that engine in production server, right?

  • Anyway, I'm not sure about mixed GPL+patent's consequences for production usage.

awakmu avatar Nov 06 '13 02:11 awakmu

I am not a lawyer so I won't answer your question about use. TokuDB is distributed as open source and included in MariaDB and Percona/MySQL. My brother works at Tokutek and is happy to speak with potential users.

mdcallag avatar Nov 06 '13 02:11 mdcallag

Hi,

Any idea what the planned release date for sophia v1.3 (or a v1.3 release candidate) is?

ghost avatar Jan 14 '14 19:01 ghost

Hello,

Do you need a some particular feature, like secondary indexes?

It's been a while and i'm unfortunately can't tell any fixed date for sure right now. For the time left from last release, i've made a couple of new engine prototypes trying to improve sophia behavior on large data sets and memory management on high load. It took a lot of time, but i'm believe i'm on the right path right now.

sophia v1.2 development status: https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c

pmwkaa avatar Jan 16 '14 11:01 pmwkaa

Yes, I am interested in secondary indexes. But, I was just curious. Lack of secondary indexes is not a showstopper for my project.

I'd rather a stable engine over new features so keep up the good work on your current track.

ghost avatar Jan 18 '14 03:01 ghost

@pmwkaa any goodies coming soon ?

jwerle avatar May 15 '14 02:05 jwerle

Any news on compression, secondary indexes or networking?

jcspencer avatar May 15 '14 03:05 jcspencer

Yes! After several prototypes made, trying new ideas of internal design, i believe i found a good one to continue development with.

Work is going according to plan, and upcoming features are:

  • multi-threaded merger and internal data sharding, sophia will use much less memory
  • storage design made ready for secondary index'es support (real support scheduled for v1.3), do less io and group-commit ready
  • support for multiple databases with on-line creation/drop support, databases share a single environment (thread pool, resources, etc.)
  • MVCC implementation, automatic gc with merge (no external 'vacuum' needed)
  • multi-stmt transactions (optimistic design, with less performance overhead) and consistent cursors
  • hot-backup support
  • engine implementation is completely rewritten, for a accurate testing and future project development
  • everything is kept simple

I've start working on integrating sophia as a disk storage for Tarantool project lately: http://tarantool.org https://github.com/tarantool/tarantool

Since i'm now able to share more time on sophia integration and it's development (as part of tarantool team), i plan to make a release in July.

Thanks for the interest! :)

pmwkaa avatar May 15 '14 10:05 pmwkaa

Sounds great! I'm looking forward to seeing these new features!

jcspencer avatar May 15 '14 10:05 jcspencer

Development branch has been published: https://github.com/pmwkaa/sophia/tree/dev

https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c announce and intrigue :)

pmwkaa avatar Jul 22 '14 17:07 pmwkaa

Woot ! Thanks for a detailed update On Jul 22, 2014 1:12 PM, "Dmitry Simonenko" [email protected] wrote:

Development branch has been published: https://github.com/pmwkaa/sophia/tree/dev

https://groups.google.com/forum/#!topic/sophia-database/C8zjKliVS3c announce and intrigue :)

— Reply to this email directly or view it on GitHub https://github.com/pmwkaa/sophia/issues/35#issuecomment-49769536.

jwerle avatar Jul 22 '14 18:07 jwerle

@pmwkaa i'm getting really excited with all this dev work

jwerle avatar Oct 17 '14 20:10 jwerle

Trying to make it worth for a long wait. Hope you guys like it :)

pmwkaa avatar Oct 19 '14 08:10 pmwkaa

Lucene is another database you might be interested in. It is the major open source text search engine, and has a modular "codec" plugin design for the actual key-value storage engine.

There have been other projects to use Cassandra as the storage engine for Lucene. The native engine is coded in Java. Sophia might have advantages over it.

http://lucene.apache.org/

LanceNorskog avatar Apr 16 '15 18:04 LanceNorskog

Thanks, i'll take a look :)

pmwkaa avatar Apr 17 '15 08:04 pmwkaa