seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Feature][HDFS File Source and Sink] Whether to support file synchronization function similar to distcp

Open CCweixiao opened this issue 1 year ago • 3 comments

Search before asking

  • [X] I had searched in the feature and found no similar feature requirement.

Description

Does it support a file synchronization function similar to distcp? There is no yarn in the HBase cluster, and there is no additional computing cluster. Therefore, if you want to use the zeta engine to implement the distcp function, you do not need to extract columns or data conversion. You only need to ensure that the data can be as it is. Output to a custom directory on the target cluster.

是否支持类似distcp的文件同步功能,在HBase集群中无yarn,也没有额外的计算集群,因此,想使用zeta引擎实现distcp的功能,不需要提取列,不需要数据转换,只需要保证数据文件能原样输出到目标集群的自定义目录,并支持断点续传,和最终文件校验

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

CCweixiao avatar May 19 '24 01:05 CCweixiao

this pr looks like match your requirment https://github.com/apache/seatunnel/pull/6826

liunaijie avatar May 21 '24 03:05 liunaijie

@liunaijie Thanks for point this PR. @CCweixiao This PR most like you want, could you try it and give us some feedback?

Hisoka-X avatar May 22 '24 13:05 Hisoka-X

This PR most like you want, could you try it and give us some feedback

Thanks for the reply, I'm ready to try this feature and continue to give feedback in the future

CCweixiao avatar May 23 '24 09:05 CCweixiao