Embucket

Run Snowflake SQL dialect on your data lake in 30 seconds. Zero dependencies.

dbt Gitlab run results

Quick start

Start Embucket and run your first query in 30 seconds:

docker run --name embucket --rm -p 8080:8080 -p 3000:3000 embucket/embucket-labs

Open localhost:8080—login: embucket/embucket—and run:

CREATE TABLE sales (id INT, product STRING, revenue DECIMAL(10,2));
INSERT INTO sales VALUES (1, 'Widget A', 1250.00), (2, 'Widget B', 899.50);
SELECT product, revenue FROM sales WHERE revenue > 1000;

Done. You just ran Snowflake SQL dialect on Apache Iceberg tables with zero configuration.

What just happened?

Embucket provides a single binary that gives you a wire-compatible Snowflake replacement:

Snowflake SQL dialect and API: Use your existing queries, dbt projects, and BI tools
Apache Iceberg storage: Your data stays in open formats on object storage
Zero dependencies: No databases, no clusters, no configuration files
Query-per-node: Each instance handles complete queries independently

Perfect for teams who want Snowflake's simplicity with bring-your-own-cloud control.

Architecture

Embucket Architecture

Zero-disk lakehouse: an architectural approach where all data and metadata live in object storage rather than on compute nodes. Nodes stay stateless and replaceable.

Built on proven open source:

Apache DataFusion for SQL execution
Apache Iceberg for ACID transactions
SlateDB for metadata management

Why Embucket?

Escape the dilemma: choose between cloud provider lakehouses (Redshift, BigQuery) or operational complexity (do-it-yourself lakehouse).

Radical simplicity - Single binary deployment
Snowflake SQL dialect compatibility - Works with your existing tools
Open data - Apache Iceberg format, no lock-in
Horizontal scaling - Add nodes for more throughput
Zero operations - No external dependencies to manage

Next steps

Ready for more? Check out the comprehensive documentation:

Quick start - Detailed setup and first queries
Architecture - How the zero-disk lakehouse works
Configuration - Production deployment options
dbt Integration - Run existing dbt projects

From source:

git clone https://github.com/embucket/embucket-labs.git
cd embucket-labs && cargo build
./target/debug/embucketd

Contributing

Contributions welcome. To get involved:

Fork the repository on GitHub
Create a new branch for your feature or bug fix
Submit a pull request with a detailed description

For more details, see CONTRIBUTING.md.

License

This project uses the Apache 2.0 License. See LICENSE for details.

embucket-labs
embucket-labs copied to clipboard

Metadata

Embucket

Quick start

What just happened?

Architecture

Why Embucket?

Next steps

Contributing

License

← Metadata

Owner

Metadata

embucket-labs embucket-labs copied to clipboard

Metadata

Embucket

Quick start

What just happened?

Architecture

Why Embucket?

Next steps

Contributing

License

← Metadata

Owner

Metadata

embucket-labs
embucket-labs copied to clipboard