Rafka
Rafka copied to clipboard
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance throug...
Rafka
A High-Performance Distributed Message Broker Built in Rust
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
๐ Key Features
- High-Performance Async Architecture: Built on Tokio for maximum concurrency and throughput
- gRPC Communication: Modern protocol buffers for efficient inter-service communication
- Partitioned Message Processing: Hash-based partitioning for horizontal scalability
- Disk-based Persistence: Write-Ahead Log (WAL) for message durability
- Consumer Groups: Load-balanced message consumption with partition assignment
- Replication: Multi-replica partitions with ISR tracking for high availability
- Log Compaction: Multiple strategies (KeepLatest, TimeWindow, Hybrid) for storage optimization
- Transactions: Two-Phase Commit (2PC) with idempotent producer support
- Comprehensive Monitoring: Health checks, heartbeat tracking, and circuit breakers
- Real-time Metrics: Prometheus-compatible metrics export with latency histograms
- Stream Processing: Kafka Streams-like API for message transformation and aggregation
- Offset Tracking: Consumer offset management for reliable message delivery
- Retention Policies: Configurable message retention based on age and size
- Modular Design: Clean separation of concerns across multiple crates
๐ Rafka vs Apache Kafka Feature Comparison
| Feature | Apache Kafka | Rafka (Current) | Status |
|---|---|---|---|
| Storage | Disk-based (Persistent) | Disk-based WAL (Persistent) | โ Implemented |
| Architecture | Leader/Follower (Zookeeper/KRaft) | P2P Mesh / Distributed | ๐ Different Approach |
| Consumption Model | Consumer Groups (Load Balancing) | Consumer Groups + Pub/Sub | โ Implemented |
| Replication | Multi-replica with ISR | Multi-replica with ISR | โ Implemented |
| Message Safety | WAL (Write Ahead Log) | WAL (Write Ahead Log) | โ Implemented |
| Transactions | Exactly-once semantics | 2PC with Idempotent Producers | โ Implemented |
| Compaction | Log Compaction | Log Compaction (Multiple Strategies) | โ Implemented |
| Ecosystem | Connect, Streams, Schema Registry | Core Broker only | โ Missing |
๐ Feature Implementation Status
โ Implemented Features
- Disk-based Persistence (WAL): Rafka now implements a Write-Ahead Log (WAL) for message durability. Messages are persisted to disk and survive broker restarts.
- Consumer Groups: Rafka supports consumer groups with load balancing. Multiple consumers can share the load of a topic, with each partition being consumed by only one member of the group. Both Range and RoundRobin partition assignment strategies are supported.
- Replication & High Availability: Rafka implements multi-replica partitions with In-Sync Replica (ISR) tracking and leader election for high availability.
- Log Compaction: Rafka supports log compaction with multiple strategies (KeepLatest, TimeWindow, Hybrid) to optimize storage by keeping only the latest value for a key.
- Transactions: Rafka implements atomic writes across multiple partitions/topics using Two-Phase Commit (2PC) protocol with idempotent producer support.
โ Missing Features
- Ecosystem Tools: Unlike Apache Kafka, Rafka currently lacks ecosystem tools like Kafka Connect (for data integration), Kafka Streams (for stream processing), and Schema Registry (for schema management). These would need to be developed separately to provide a complete data streaming platform.
๐๏ธ Architecture Overview
System Architecture Diagram
graph TB
subgraph "Client Layer"
P[Producer]
C[Consumer]
end
subgraph "Broker Cluster"
B1[Broker 1<br/>Partition 0]
B2[Broker 2<br/>Partition 1]
B3[Broker 3<br/>Partition 2]
end
subgraph "Storage Layer"
S1[In-Memory DB<br/>Partition 0]
S2[In-Memory DB<br/>Partition 1]
S3[In-Memory DB<br/>Partition 2]
end
P -->|gRPC Publish| B1
P -->|gRPC Publish| B2
P -->|gRPC Publish| B3
B1 -->|Store Messages| S1
B2 -->|Store Messages| S2
B3 -->|Store Messages| S3
C -->|gRPC Consume| B1
C -->|gRPC Consume| B2
C -->|gRPC Consume| B3
B1 -->|Broadcast Stream| C
B2 -->|Broadcast Stream| C
B3 -->|Broadcast Stream| C
Message Flow Sequence Diagram
sequenceDiagram
participant P as Producer
participant B as Broker
participant S as Storage
participant C as Consumer
P->>B: PublishRequest(topic, key, payload)
B->>B: Hash key for partition
B->>B: Check partition ownership
B->>S: Store message with offset
S-->>B: Return offset
B->>B: Broadcast to subscribers
B-->>P: PublishResponse(message_id, offset)
C->>B: ConsumeRequest(topic)
B->>B: Create broadcast stream
B-->>C: ConsumeResponse stream
loop Message Processing
B->>C: ConsumeResponse(message)
C->>B: AcknowledgeRequest(message_id)
C->>B: UpdateOffsetRequest(offset)
end
๐ Project Structure
rafka/
โโโ Cargo.toml # Workspace manifest
โโโ config/
โ โโโ config.yml # Configuration file
โโโ scripts/ # Demo and utility scripts
โ โโโ helloworld.sh # Basic producer-consumer demo
โ โโโ partitioned_demo.sh # Multi-broker partitioning demo
โ โโโ retention_demo.sh # Message retention demo
โ โโโ offset_tracking_demo.sh # Consumer offset tracking demo
โ โโโ kill.sh # Process cleanup script
โโโ src/
โ โโโ bin/ # Executable binaries
โ โโโ start_broker.rs # Broker server
โ โโโ start_producer.rs # Producer client
โ โโโ start_consumer.rs # Consumer client
โ โโโ check_metrics.rs # Metrics monitoring
โโโ crates/ # Core library crates
โ โโโ core/ # Core types and gRPC definitions
โ โ โโโ src/
โ โ โ โโโ lib.rs
โ โ โ โโโ message.rs # Message structures
โ โ โ โโโ proto/
โ โ โ โโโ rafka.proto # gRPC service definitions
โ โ โโโ build.rs # Protocol buffer compilation
โ โโโ broker/ # Broker implementation
โ โ โโโ src/
โ โ โโโ lib.rs
โ โ โโโ broker.rs # Core broker logic
โ โโโ producer/ # Producer implementation
โ โ โโโ src/
โ โ โโโ lib.rs
โ โ โโโ producer.rs # Producer client
โ โโโ consumer/ # Consumer implementation
โ โ โโโ src/
โ โ โโโ lib.rs
โ โ โโโ consumer.rs # Consumer client
โ โโโ storage/ # Storage engine
โ โโโ src/
โ โโโ lib.rs
โ โโโ db.rs # In-memory database
โโโ docs/
โ โโโ getting_started.md # Getting started guide
โโโ tasks/
โ โโโ Roadmap.md # Development roadmap
โโโ Dockerfile # Container configuration
โโโ LICENSE # MIT License
๐ Quick Start
Prerequisites
- Rust: Latest stable version (1.70+)
- Cargo: Comes with Rust installation
- Protocol Buffers: For gRPC compilation
Installation
- Clone the repository:
git clone https://github.com/yourusername/rafka.git
cd rafka
- Build the project:
cargo build --release
- Run the basic demo:
./scripts/helloworld.sh
Manual Setup
- Start a broker:
cargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3
- Start a consumer:
cargo run --bin start_consumer -- --port 50051
- Send messages:
cargo run --bin start_producer -- --message "Hello, Rafka!" --key "test-key"
๐ง Configuration
Broker Configuration
The broker can be configured via command-line arguments:
cargo run --bin start_broker -- \
--port 50051 \
--partition 0 \
--total-partitions 3 \
--retention-seconds 604800
Available Options:
--port: Broker listening port (default: 50051)--partition: Partition ID for this broker (default: 0)--total-partitions: Total number of partitions (default: 1)--retention-seconds: Message retention time in seconds (default: 7 days)
Configuration File
Edit config/config.yml for persistent settings:
server:
host: "127.0.0.1"
port: 9092
log:
level: "info" # debug, info, warn, error
broker:
replication_factor: 3
default_topic_partitions: 1
storage:
type: "in_memory"
๐๏ธ Core Components
1. Core (rafka-core)
Purpose: Defines fundamental types and gRPC service contracts.
Key Components:
- Message Structures:
Message,MessageAck,BenchmarkMetrics - gRPC Definitions: Protocol buffer definitions for all services
- Serialization: Serde-based serialization for message handling
Key Files:
message.rs: Core message types and acknowledgment structuresproto/rafka.proto: gRPC service definitions
2. Broker (rafka-broker)
Purpose: Central message routing and coordination service.
Key Features:
- Partition Management: Hash-based message partitioning
- Topic Management: Dynamic topic creation and subscription
- Broadcast Channels: Efficient message distribution to consumers
- Offset Tracking: Consumer offset management
- Retention Policies: Configurable message retention
- Metrics Collection: Real-time performance metrics
Key Operations:
publish(): Accept messages from producersconsume(): Stream messages to consumerssubscribe(): Register consumer subscriptionsacknowledge(): Process message acknowledgmentsupdate_offset(): Track consumer progress
3. Producer (rafka-producer)
Purpose: Client library for publishing messages to brokers.
Key Features:
- Connection Management: Automatic broker connection handling
- Message Publishing: Reliable message delivery with acknowledgments
- Error Handling: Comprehensive error reporting
- UUID Generation: Unique message identification
Usage Example:
let mut producer = Producer::new("127.0.0.1:50051").await?;
producer.publish("my-topic".to_string(), "Hello World".to_string(), "key-1".to_string()).await?;
4. Consumer (rafka-consumer)
Purpose: Client library for consuming messages from brokers.
Key Features:
- Subscription Management: Topic subscription handling
- Stream Processing: Asynchronous message streaming
- Automatic Acknowledgment: Built-in message acknowledgment
- Offset Tracking: Automatic offset updates
- Channel-based API: Clean async/await interface
Usage Example:
let mut consumer = Consumer::new("127.0.0.1:50051").await?;
consumer.subscribe("my-topic".to_string()).await?;
let mut rx = consumer.consume("my-topic".to_string()).await?;
while let Some(message) = rx.recv().await {
println!("Received: {}", message);
}
5. Storage (rafka-storage)
Purpose: High-performance in-memory storage engine.
Key Features:
- Partition-based Storage: Separate queues per partition
- Retention Policies: Age and size-based message retention
- Offset Management: Efficient offset tracking and retrieval
- Acknowledgment Tracking: Consumer acknowledgment management
- Metrics Collection: Storage performance metrics
- Memory Optimization: Efficient memory usage with cleanup
Storage Architecture:
graph LR
subgraph "Storage Engine"
T[Topic]
P1[Partition 0]
P2[Partition 1]
P3[Partition 2]
T --> P1
T --> P2
T --> P3
P1 --> Q1[Message Queue]
P2 --> Q2[Message Queue]
P3 --> Q3[Message Queue]
end
๐ Message Flow
Publishing Flow
- Producer sends
PublishRequestto Broker - Broker hashes the message key to determine partition
- Broker checks partition ownership
- Broker stores message in Storage with unique offset
- Broker broadcasts message to subscribed consumers
- Broker returns
PublishResponsewith message ID and offset
Consumption Flow
- Consumer sends
ConsumeRequestto Broker - Broker creates broadcast stream for the topic
- Broker streams messages via gRPC to Consumer
- Consumer processes message and sends acknowledgment
- Consumer updates offset to track progress
- Storage cleans up acknowledged messages based on retention policy
๐ Performance Features
Partitioning Strategy
Rafka uses hash-based partitioning for efficient message distribution:
fn hash_key(&self, key: &str) -> u32 {
key.bytes().fold(0u32, |acc, b| acc.wrapping_add(b as u32))
}
fn owns_partition(&self, message_key: &str) -> bool {
let hash = self.hash_key(message_key);
hash % self.total_partitions == self.partition_id
}
Retention Policies
Configurable message retention based on:
- Time-based: Maximum age (default: 7 days)
- Size-based: Maximum storage size (default: 1GB)
Metrics Collection
Built-in metrics for monitoring:
- Total messages stored
- Total bytes consumed
- Oldest message age
- Consumer offset positions
๐งช Demo Scripts
1. Hello World Demo
./scripts/helloworld.sh
Basic producer-consumer interaction demonstration.
2. Partitioned Demo
./scripts/partitioned_demo.sh
Multi-broker setup with hash-based partitioning.
3. Retention Demo
./scripts/retention_demo.sh
Demonstrates message retention policies.
4. Offset Tracking Demo
./scripts/offset_tracking_demo.sh
Shows consumer offset management and recovery.
๐ ๏ธ Development
Building from Source
# Clone repository
git clone https://github.com/yourusername/rafka.git
cd rafka
# Build all crates
cargo build
# Run tests
cargo test
# Build release version
cargo build --release
Running Tests
# Run all tests
cargo test
# Run specific crate tests
cargo test -p rafka-storage
cargo test -p rafka-broker
Code Structure
The project follows Rust best practices with:
- Workspace Organization: Multiple crates in a single workspace
- Separation of Concerns: Each component in its own crate
- Async/Await: Modern async Rust with Tokio
- Error Handling: Comprehensive error types and handling
- Testing: Unit tests for all major components
๐ง Current Status
โ ๏ธ Early Development - Not Production Ready
Rafka is currently in active development. The current implementation provides:
โ Completed Features:
- Basic message publishing and consumption
- Hash-based partitioning
- In-memory storage with retention policies
- Consumer offset tracking
- gRPC-based communication
- Metrics collection
- Demo scripts and examples
๐ In Progress:
- Peer-to-peer mesh networking
- Distributed consensus algorithms
- Kubernetes deployment configurations
- Performance optimizations
๐ Planned Features:
- Replication across multiple brokers
- Fault tolerance and recovery
- Security and authentication
- Client SDKs for multiple languages
- Comprehensive monitoring and alerting
๐ค Contributing
We welcome contributions! Here are some areas where you can help:
High Priority
- P2P Mesh Implementation: Distributed node discovery and communication
- Consensus Algorithms: Leader election and cluster coordination
- Replication: Cross-broker message replication
- Fault Tolerance: Node failure detection and recovery
Medium Priority
- Performance Optimization: Message batching and compression
- Security: TLS encryption and authentication
- Monitoring: Prometheus metrics and Grafana dashboards
- Documentation: API documentation and tutorials
Getting Started
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Apache Kafka for inspiration on messaging systems
- Tokio for the excellent async runtime
- Tonic for gRPC implementation
- @wyattgill9 for the initial proof of concept
- The Rust community for their excellent libraries and support
Built with โค๏ธ in Rust