[Sandbox] SlateDB
Application contact emails
Project Summary
A cloud native embedded storage engine built on object storage.
Project Description
SlateDB is an embedded storage engine built as a log-structured merge-tree. Unlike traditional LSM-tree storage engines, SlateDB writes data to object storage (S3, GCS, ABS, MinIO, Tigris, and so on). Leveraging object storage allows SlateDB to provide bottomless storage capacity, high durability, and easy replication. The trade-off is that object storage has a higher latency and higher API cost than local disk.
To mitigate high write API costs (PUTs), SlateDB batches writes. Rather than writing every put() call to object storage, MemTables are flushed periodically to object storage as a string-sorted table (SST). The flush interval is configurable.
To mitigate write latency, SlateDB provides an async put method. Clients that prefer strong durability can await on put until the MemTable is flushed to object storage (trading latency for durability). Clients that prefer lower latency can simply ignore the future returned by put.
To mitigate read latency and read API costs (GETs), SlateDB uses standard LSM-tree caching techniques: in-memory block caches, compression, bloom filters, and local SST disk caches.
Org repo URL (provide if all repos under the org are in scope of the application)
https://github.com/slatedb
Project repo URL in scope of application
N/A
Additional repos in scope of the application
No response
Website URL
https://slatedb.io
Roadmap
https://github.com/slatedb/slatedb/milestones
Roadmap context
SlateDB is a very young project. We don't have a roadmap with specific timelines and dependencies. Instead, we've been using milestones to manage project work.
Contributing Guide
https://github.com/slatedb/slatedb/blob/main/CONTRIBUTING.md
Code of Conduct (CoC)
https://github.com/slatedb/slatedb/blob/main/CODE_OF_CONDUCT.md
Adopters
No response
Contributing or Sponsoring Org
No response
Maintainers file
https://github.com/slatedb/slatedb/blob/main/MAINTAINERS.md
IP Policy
- [X] If the project is accepted, I agree the project will follow the CNCF IP Policy
Trademark and accounts
- [X] If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF
Why CNCF?
I wanted an independent foundation to own the code, trademarks, and so on for SlateDB. I also wanted a foundation to signal that we are adhering to a common-sense, standard project management style. We have multiple companies working on the project, so governance is important. I would love to get some guidance there, as well.
Lastly, I wanted a foundation that didn't insist on antiquated infrastructure such as JIRA.
Benefit to the Landscape
There has been a lot of interest in SlateDB. I wrote a post about the idea initially. Since then, early interest has come from the streaming and durable execution community.
Most adopters previously looked at rocksdb-cloud, but found the lack of documentation and support a bit of a show stopper. Additionally, RocksDB's write-ahead log isn't integrated into object storage, which means it's still stateful. SlateDB, by contrast, allows a completely stateless deployment since all state is persisted in object storage.
SlateDB fits well for systems that are OK with 20-100ms of write latency, but want high durability and easy operations.
Cloud Native 'Fit'
No response
Cloud Native 'Integration'
No response
Cloud Native Overlap
No response
Similar projects
The only similar project I'm familar with is RocksDB-Cloud.
Landscape
No
Business Product or Service to Project separation
The only adopting company currently contributing is Responsive.dev. They are a streaming company using SlateDB for state management. SlateDB is run entirely separately and we plan to adopt an ICLA.
Project presentations
We have a p99conf presentation coming up in September 2024.
Project champions
No response
Additional information
The main github repo (https://github.com/slatedb/slatedb) is not currently public. We are opening it up on ~August 19. Please contact me if you need early access.
The website is still a work in progress. It's got a lot of Docusaurus boilerplate. Planning to clean that up and write some docs in the next week or two.
Note: SlateDB is now open source and the github repository is publicly available: https://github.com/slatedb/slatedb
@chira001 @raffaelespazzoli @xing-yang Does the TAG have a recommendation regarding this project?
@criccomini would you complete the cloud native fit section of the application?
@TheFoxAtWork I'll be the one reviewing the project. I don't have a recommendation yet.
@criccomini would you complete the cloud native fit section of the application?
Took a shot at this. Let me know if you need more. :)
@criccomini would you or someone on behalf of slateDB be able to present at TAG storage? I will contact you on linkedin so we can exchange contacts.
(We got in touch over email, and have set a time for Oct 23, 8AM pacific)
Does SlateDB support Rook or other cloud native object stores? The stores you've named are not cloud native, they're just cloud.
SlateDB uses the object_store Rust crate to interact with object storage. Thus, it supports any object store that object_store supports. These include:
AWS S3 Azure Blob Storage Google Cloud Storage Local files Memory HTTP/WebDAV Storage
Because of its AWS S3 support, it supports any S3-compatible API such as R2, Tigris, MinIO, and LocalStack as well.
I am not familiar with Rook. Some cursory googling suggests that it supports the S3 API, so it should work with SlateDB.
TAG Contributor strategy has reviewed this project and found the following:
- The contributor guide is just a stub
- There is no written governance, yet.
- The roadmap is a set of feature-based "milestones". It appears relatively new.
- There are four maintainers, who all work for different employers.
This review is for the TOC’s information only. Sandbox projects are not required to have full governance or contributor documentation.
We've decided to withdraw our application. Thanks for all your help along the way.