embucket-labs
embucket-labs copied to clipboard
Experimental version. A BYOC option for Snowflake workloads
Embucket
Run Snowflake SQL dialect on your data lake in 30 seconds. Zero dependencies.
Quick start
Start Embucket and run your first query in 30 seconds:
docker run --name embucket --rm -p 8080:8080 -p 3000:3000 embucket/embucket-labs
Open localhost:8080—login: embucket/embucket—and run:
CREATE TABLE sales (id INT, product STRING, revenue DECIMAL(10,2));
INSERT INTO sales VALUES (1, 'Widget A', 1250.00), (2, 'Widget B', 899.50);
SELECT product, revenue FROM sales WHERE revenue > 1000;
Done. You just ran Snowflake SQL dialect on Apache Iceberg tables with zero configuration.
What just happened?
Embucket provides a single binary that gives you a wire-compatible Snowflake replacement:
- Snowflake SQL dialect and API: Use your existing queries, dbt projects, and BI tools
- Apache Iceberg storage: Your data stays in open formats on object storage
- Zero dependencies: No databases, no clusters, no configuration files
- Query-per-node: Each instance handles complete queries independently
Perfect for teams who want Snowflake's simplicity with bring-your-own-cloud control.
Architecture

Zero-disk lakehouse: an architectural approach where all data and metadata live in object storage rather than on compute nodes. Nodes stay stateless and replaceable.
Built on proven open source:
- Apache DataFusion for SQL execution
- Apache Iceberg for ACID transactions
- SlateDB for metadata management
Why Embucket?
Escape the dilemma: choose between cloud provider lakehouses (Redshift, BigQuery) or operational complexity (do-it-yourself lakehouse).
- Radical simplicity - Single binary deployment
- Snowflake SQL dialect compatibility - Works with your existing tools
- Open data - Apache Iceberg format, no lock-in
- Horizontal scaling - Add nodes for more throughput
- Zero operations - No external dependencies to manage
Next steps
Ready for more? Check out the comprehensive documentation:
Quick start - Detailed setup and first queries
Architecture - How the zero-disk lakehouse works
Configuration - Production deployment options
dbt Integration - Run existing dbt projects
From source:
git clone https://github.com/embucket/embucket-labs.git
cd embucket-labs && cargo build
./target/debug/embucketd
Contributing
Contributions welcome. To get involved:
- Fork the repository on GitHub
- Create a new branch for your feature or bug fix
- Submit a pull request with a detailed description
For more details, see CONTRIBUTING.md.
License
This project uses the Apache 2.0 License. See LICENSE for details.