cozo icon indicating copy to clipboard operation
cozo copied to clipboard

providing an alternative tiKV store

Open docteurklein opened this issue 3 years ago • 2 comments

Hello :) Lovely project, thank you for this!

Could it be possible to switch stores with a distributed remote one like tiKV?

docteurklein avatar Nov 08 '22 08:11 docteurklein

Thank you for your interest!

In principle, yes, any KV store can be used as the storage layer. The only requirement is the ability to store binary key-value pairs and to scan keys by bytes order (some of the ACID properties may have to be relaxed if the store does not have the required operations). We are currently refactoring in order to make an internal storage API available. When that is done, swapping should be trivial. We may even provide a ready-to-run distribution!

Would you mind describing your use case for a distributed store a little bit? And do you have any consistency/latency/throughput requirements? What is the maximum amount of data that a single query could touch for your use case?

zh217 avatar Nov 08 '22 11:11 zh217

Thanks for the answers :)

A use-case I had in mind was to have a scalable, multi-tenant database for web frontends. Having separate storage from compute would allow scalability of stateless compute nodes (cozo), and let storage backends like foundationDB or tiKV handle the ACID concurrency stuff.

The advantage is bottomless scalability, and not being bottlenecked by single node I/O limits, at the cost of network latency.

docteurklein avatar Nov 08 '22 13:11 docteurklein

The storage layer API rewrite is now complete. It works very well with other embedded engines such as SQLite, which opens up way to have Cozo running on mobile (compiling rocksdb for mobile is too much trouble, and the mobile version of rocksdb lacks critical features).

However, I'm not very positive about the performance when the backend is a distributed store. It works, sure, but our smoke tests complete in around 130 seconds under TiKV backend. After we did some obvious optimizations it went down to around 40 seconds, but going beyond that could take lots of work and may only yield meager returns. For comparison, with purely in-memory storage it completes about 0.6 seconds, with RocksDB backend about 0.8 seconds, and with SQLite backend about 1.2 seconds.

@docteurklein is 1 to 2 orders of magnitude decrease in performance acceptable for your use case? It certainly isn't for us, as we can pretty much forget about running most of the graph algorithms. Currently under distributed storage the only queries that can still run efficiently are simple point queries that yield small number of rows.

To regain performance under the distributed setting really requires a distributed and replicated KV store. Unfortunately there seems to be no suitable open source projects that we can just use, and writing one is significantly more work than this project, and is certainly outside the scope of this project.

zh217 avatar Nov 13 '22 05:11 zh217

Close this, as nothing more can be done at the moment.

zh217 avatar Nov 16 '22 03:11 zh217

Thanks for the feedback! Looks like shared-nothing architectures always win when it comes to perf. I'd love to give tikv a spin anyway, did you publish the pluggable tikv backend?

docteurklein avatar Nov 18 '22 10:11 docteurklein

@docteurklein It will be available with the next release, but it will be highly experimental (and untested).

zh217 avatar Nov 18 '22 10:11 zh217

It has been already several releases since @zh217 tried tikv as backend. @docteurklein did you find some time to try it yourself?

dumblob avatar Jan 24 '23 21:01 dumblob

not yet because I would have to recompile cozo with the correct features.

docteurklein avatar Jan 27 '23 09:01 docteurklein

@zh217 to do this properly (without the drop in performance), it would need some more abstract distribution right? Like replication & rebalancing based on usage? Seems unlikely it can be fast, ever, based on low level kv like that.

tluyben avatar Jul 09 '23 12:07 tluyben