flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-27805][Connectors/ORC] bump orc version to 1.7.5

Open liujiawinds opened this issue 2 years ago • 11 comments

What is the purpose of the change

In order to use new features (zstd compression, column encryption etc.) in 1.6.x and 1.7.x.

Release Notes

Brief change log

  • Update orc.version to 1.7.5
  • Clone a new version of PhysicalFsWriter for files to create a PhysicalWriterImpl for streams
  • Enable encryption & mask configuration.
  • Extract encryption setup methods from WriterImpl for PhysicalWriterImpl.
  • Unify the column names used in the test case to match the column names in the file.

Verifying this change

This change added tests and can be verified as follows:

  • Read and write ORC file with ZSTD compression
  • Read and write ORC file with encryption &mask key configuration

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

liujiawinds avatar May 30 '22 08:05 liujiawinds

CI report:

  • f3bf29e2b2910bfd62da0702397da6b7a9034e24 UNKNOWN
  • 074b0e3263647e65c0f3b3bf26117bd48c5ee977 UNKNOWN
  • aafe8a07c5b0722220d895b13248fc18bc82b6b8 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar May 30 '22 08:05 flinkbot

@JingsongLi Could you help review this pr?

liujiawinds avatar Jun 10 '22 12:06 liujiawinds

I have submitted two pr to ORC community. ORC-1200: Extracting encryption setup logic from WriterImpl ORC-1198: Add a new PhysicalFsWriter constructor with FSDataOutputStream parameter

I will refactor this part of the code after they are merged.

liujiawinds avatar Jun 13 '22 02:06 liujiawinds

@lirui-apache Please take a look.

liujiawinds avatar Jun 20 '22 08:06 liujiawinds

For a record, to reviewers, ORC-1198 is already shipped via Apache ORC 1.7.5 and included in this PR.

dongjoon-hyun avatar Jul 21 '22 05:07 dongjoon-hyun

Could you review this when you have some time, @mbalassi and @gyfora ? :)

dongjoon-hyun avatar Jul 21 '22 05:07 dongjoon-hyun

Sure, we will take a look :)

gyfora avatar Jul 21 '22 07:07 gyfora

Personally, I'd like to recommend you to remove Encryption part from this PR completely.

Since the encryption is not part of the official Apache ORC, +1 for removing this from the PR

MartijnVisser avatar Jul 24 '22 18:07 MartijnVisser

Also, cc @williamhyun since he works as a release manager of Apache ORC 1.8.0.

dongjoon-hyun avatar Jul 24 '22 21:07 dongjoon-hyun

@dongjoon-hyun @MartijnVisser Encryption part has been removed from this PR.

liujiawinds avatar Jul 25 '22 01:07 liujiawinds

Hi, could you review this once more, @mbalassi , @gyfora , @morhidi , @MartijnVisser ?

dongjoon-hyun avatar Aug 08 '22 17:08 dongjoon-hyun

@liujiawinds are you still working on this one? Happy to take over if its up for grabs

pgaref avatar Mar 13 '23 21:03 pgaref

@pgaref Feel free to take over this.

liujiawinds avatar Mar 14 '23 01:03 liujiawinds

Thank you, @pgaref and @liujiawinds .

dongjoon-hyun avatar Mar 15 '23 00:03 dongjoon-hyun

Surpassed by https://github.com/apache/flink/pull/22481

dmvk avatar May 03 '23 09:05 dmvk