hudi-rs icon indicating copy to clipboard operation
hudi-rs copied to clipboard

Feature Request: Support Clustering in hudi-rs

Open CTTY opened this issue 5 months ago • 0 comments

Feature Description

Clustering is a core optimization feature in Apache Hudi, widely used to manage small files and improve query performance.

I’d love to see support for clustering in hudi-rs, which could handle this efficiently thanks to Rust’s performance. This would enable production-grade optimization workflows in native Rust pipelines.

Why this matters

  • Performance: Rust is expected to make compute-intensive operations like clustering much more performant
  • Ease of migration: Users who are using standalone clustering should be able to migrate to hudi-rs clustering easily
  • Ecosystem trend: Similar efforts are emerging in other formats, e.g. Iceberg compaction.

Suggested Scope

Initial support might include:

  • Reading clustering plans
  • Executing clustering as a standalone action
  • Supporting inline clustering in write paths (optional follow-up)

Prerequisites

  • hudi-rs doesn't have write support overall, it needs to be able to write and commit data before we implementing complicated table services

Additional context

No response

CTTY avatar Jul 15 '25 03:07 CTTY