hudi-rs
hudi-rs copied to clipboard
Feature Request: Support Clustering in hudi-rs
Feature Description
Clustering is a core optimization feature in Apache Hudi, widely used to manage small files and improve query performance.
I’d love to see support for clustering in hudi-rs, which could handle this efficiently thanks to Rust’s performance. This would enable production-grade optimization workflows in native Rust pipelines.
Why this matters
- Performance: Rust is expected to make compute-intensive operations like clustering much more performant
- Ease of migration: Users who are using standalone clustering should be able to migrate to hudi-rs clustering easily
- Ecosystem trend: Similar efforts are emerging in other formats, e.g. Iceberg compaction.
Suggested Scope
Initial support might include:
- Reading clustering plans
- Executing clustering as a standalone action
- Supporting inline clustering in write paths (optional follow-up)
Prerequisites
- hudi-rs doesn't have write support overall, it needs to be able to write and commit data before we implementing complicated table services
Additional context
No response