iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

HDFS support in iceberg-rust

Open Xuanwo opened this issue 9 months ago • 18 comments

There are several ways to access HDFS in rust:

  • libhdfs + JNI: needs java runtime and libhdfs
  • WebHDFS: http based, may have performance issues
  • HDFS Native: rust native but not well adopted.

OpenDAL supports all these methods. However, from Iceberg's perspective, it's better to expose a single hdfs storage option and allow users to choose the method via feature flags to avoid confusion. This is because we write service names in our metadata files, and all implementations must be able to read hdfs://host:port/files.

So my current plan is to introduce webhdfs and hdfs_native support in iceberg-rust. They will hide under storage-webhdfs and storage-hdfs-native flag:

Users can choose which implementations to use:

fn parse_scheme(scheme: &str) -> crate::Result<Scheme> {
    match scheme {
        "memory" => Ok(Scheme::Memory),
        "file" | "" => Ok(Scheme::Fs),
        "s3" | "s3a" => Ok(Scheme::S3),
        "gs" | "gcs" => Ok(Scheme::Gcs),
        #[cfg(feature = "storage-webhdfs")]
        "hdfs" => Ok(Scheme::Webhdfs),
        #[cfg(feature = "storage-hdfs-native")]
        "hdfs" => Ok(Scheme::HdfsNative),
        s => Ok(s.parse::<Scheme>()?),
    }
}

See https://github.com/apache/iceberg-rust/pull/1131 for more details about this design.

Xuanwo avatar Mar 24 '25 04:03 Xuanwo

cc @liurenjie1024, let's discuss here.

I'm also fine to add a new services called webhdfs here. But the location returned by hive will still be hdfs://host:port/abc. It will be a bit confusing.

Xuanwo avatar Mar 25 '25 02:03 Xuanwo

cc @liurenjie1024, let's discuss here.

I'm also fine to add a new services called webhdfs here. But the location returned by hive will still be hdfs://host:port/abc. It will be a bit confusing.

Do you mean that even if user is using webhdfs, hive metastore will still return hdfs path?

liurenjie1024 avatar Mar 25 '25 02:03 liurenjie1024

Do you mean that even if user is using webhdfs, hive metastore will still return hdfs path?

Yep. There is no iceberg impls will write webhdfs://xxx.

Xuanwo avatar Mar 25 '25 03:03 Xuanwo

Do you mean that even if user is using webhdfs, hive metastore will still return hdfs path?

Yep. There is no iceberg impls will write webhdfs://xxx.

Seems there are several problems needs to resolve here:

  1. Should we add webhdfs support as feature? It's mainly compile time support or runtime support for webhdfs
  2. Should we treat webhdfs/hdfs as same storage or different service?

Is there other problems to discuss?

liurenjie1024 avatar Mar 25 '25 03:03 liurenjie1024

  1. It's mainly compile time support or runtime support for webhdfs

What's the difference about this?

Xuanwo avatar Mar 25 '25 04:03 Xuanwo

  1. It's mainly compile time support or runtime support for webhdfs

What's the difference about this?

Oh, I was thinking about treating hdfs/webhdfs as same storage service. I took a look your pr, if they are treated as different stoarge, then there is difference.

liurenjie1024 avatar Mar 25 '25 05:03 liurenjie1024

Oh, I was thinking about treating hdfs/webhdfs as same storage service. I took a look your pr, if they are treated as different stoarge, then there is difference.

I see, you mean that we have a single Storage Hdfs but provides a config like use_webhdfs for users to switch?

Xuanwo avatar Mar 25 '25 05:03 Xuanwo

Oh, I was thinking about treating hdfs/webhdfs as same storage service. I took a look your pr, if they are treated as different stoarge, then there is difference.

I see, you mean that we have a single Storage Hdfs but provides a config like use_webhdfs for users to switch?

Yes, but I'm not sure it's the best approach.

liurenjie1024 avatar Mar 25 '25 05:03 liurenjie1024

I see. We have three approachs so far:

  1. Support storage-webhdfs and storage-hdfs but only exposes as hdfs to users

Users are required to use feature flags for configuration.

  1. Have only storage-hdfs but allow users to pick up impls like webhdfs or hdfs at runtime by config.

Provides more control for users, but it may be a bit confusing when feature flags don't match the configuration.

  1. Exposes storage-webhdfs and storage-hdfs seperately

It doesn't work for webhdfs since all tables will use hdfs://abc.

Xuanwo avatar Mar 25 '25 05:03 Xuanwo

Have you considered including libhdfs + JNI in this design? Are we intentionally leaving it out for now, or could it be explored as a separate approach?

oven-yang avatar Mar 27 '25 07:03 oven-yang

I was wondering if the viewfs:// scheme might be supported in this context? I’ve tested it with OpenDAL and confirmed that it’s already supported there.

oven-yang avatar Mar 27 '25 07:03 oven-yang

Have you considered including libhdfs + JNI in this design? Are we intentionally leaving it out for now, or could it be explored as a separate approach?

Yep, I intentionally leaving it out for now since I don't want to play with java runtime for now. And I'm guessing adding it doesn't need another approach. We can just follow the same way to handle it.

I was wondering if the viewfs:// scheme might be supported in this context? I’ve tested it with OpenDAL and confirmed that it’s already supported there.

Oh, that's something new to me. We just need tp map viewfs to hdfs, right?

Xuanwo avatar Mar 27 '25 08:03 Xuanwo

Have you considered including libhdfs + JNI in this design? Are we intentionally leaving it out for now, or could it be explored as a separate approach?

Yep, I intentionally leaving it out for now since I don't want to play with java runtime for now. And I'm guessing adding it doesn't need another approach. We can just follow the same way to handle it.

I actually tested accessing HDFS using libhdfs + JNI myself, and it worked! The main issue I ran into was linking the JVM library. It’d be awesome if support for this could be added here eventually.

I was wondering if the viewfs:// scheme might be supported in this context? I’ve tested it with OpenDAL and confirmed that it’s already supported there.

Oh, that's something new to me. We just need tp map viewfs to hdfs, right?

To clarify, I ran my test using libhdfs + JNI, and I’m not entirely sure if viewfs is supported in the other two approaches. In my experience with libhdfs + JNI, viewfs is handled the same way as hdfs, with libhdfs taking care of the differences under the hood. That said, it seems like we’d only need to explicitly pass the scheme to OpenDAL for it to work smoothly.

oven-yang avatar Mar 27 '25 08:03 oven-yang

Sorry that I had to leave early today at the sync, it was a great discussion. Both in Java and PyIceberg there is a FileIO property that you can set to force a specific FileIO: https://py.iceberg.apache.org/configuration/#fileio. This is slightly different since all the implementations come from OpenDAL, but might be an option to consider.

Fokko avatar Mar 27 '25 18:03 Fokko

Per yesterday's discussion, we reached consensus on following points:

  1. We should treat webhdfs as a storage different hdfs given their configuration would be different.
  2. We should include both libhdfs and hdfsrs in iceberg-rust and allow user to choose at runtime. Libhdfs is still much more mature than hdfsrs so in serious hadoop deployment it's still useful. Hdfsrs allows user to run without relying on, so we think it's still valueful in some use cases.

liurenjie1024 avatar Mar 28 '25 13:03 liurenjie1024

AFAIK, webhdfs doesn't support HDFS NameNode HA. When fail over happens, it won't switch to the new active NameNode automatically.

manuzhang avatar Mar 28 '25 14:03 manuzhang

Does iceberg rust support hadoop catalog? I only see s3tables in the code base.

mingmwang avatar Apr 11 '25 02:04 mingmwang

Does iceberg rust support hadoop catalog? I only see s3tables in the code base.

Hi, @mingmwang There is no support for hadoop catalog, but there is StaticTable which allows you to query tables without relying on a catalog.

liurenjie1024 avatar Apr 11 '25 08:04 liurenjie1024

🔄 HDFS支持实施状态更新 (2025年9月12日)

✅ 技术分析完成

经过全面分析,确认以下技术要点:

  1. OpenDAL 0.54.0 完全支持:确认支持三种HDFS服务

    • Hdfs (libhdfs + JNI)
    • WebHdfs (HTTP REST API)
    • HdfsNative (纯Rust实现)
  2. 存储架构已就绪:当前Storage枚举架构完全支持扩展HDFS变体

    • 现有模式:Memory、LocalFs、S3、GCS、OSS、Azdls
    • 可直接添加三个HDFS变体,遵循相同的配置和操作符创建模式
  3. 社区共识明确:基于2025年3月讨论达成的一致方向

    • 三个独立Storage变体
    • 运行时选择机制(通过feature flags)
    • 支持viewfs://协议映射

📋 实施计划

优先级顺序:WebHdfs > Hdfs > HdfsNative

技术实施步骤

  1. 添加feature flags:storage-hdfsstorage-webhdfsstorage-hdfs-native
  2. 在Storage枚举中添加三个HDFS变体
  3. 实现配置解析和操作符创建函数
  4. 在parse_scheme中添加协议映射(hdfs://、webhdfs://、viewfs://)
  5. 设计运行时选择机制

⚠️ 关键限制说明

  • WebHDFS NameNode HA限制:如@manuzhang指出,WebHDFS不支持自动故障切换,需在文档中突出说明
  • libhdfs运行时要求:需要Java环境支持

🔍 PR #1450 状态确认

经检查,PR #1450 当前状态:

  • 状态:CONFLICTING(合并冲突)
  • 最后更新:2025年6月18日
  • 技术方向:专注于Hadoop catalog实现,而非Storage层面

该PR方向与当前Storage层面需求不同,需要与@awol2005ex进一步沟通确认后续处理方案。

🎯 下一步行动

准备开始具体实施,将基于现有Storage架构模式直接实现HDFS支持,无需等待存在冲突的PR。

估计实施时间:2-3周完成核心功能

Xuanwo avatar Sep 12 '25 03:09 Xuanwo

@Xuanwo Are you doing some experiment with AI summary?

manuzhang avatar Sep 12 '25 04:09 manuzhang

OpenDAL的HDFS写路径问题挺多的

mingmwang avatar Sep 12 '25 04:09 mingmwang

而且OpenDAL套路有点深,问题很难triage

mingmwang avatar Sep 12 '25 04:09 mingmwang

@Xuanwo Are you doing some experiment with AI summary?

Oh, sorry about that. My AI summary bot accidentally posted the comments publicly.

Xuanwo avatar Sep 12 '25 08:09 Xuanwo