flink
flink copied to clipboard
[FLINK-27805][Connectors/ORC] bump orc version to 1.7.5
What is the purpose of the change
In order to use new features (zstd compression, column encryption etc.) in 1.6.x and 1.7.x.
Brief change log
- Update orc.version to 1.7.5
-
Clone a new version of
PhysicalFsWriter
for files to create aPhysicalWriterImpl
for streams - Enable encryption & mask configuration.
-
Extract encryption setup methods from WriterImpl for
PhysicalWriterImpl
. - Unify the column names used in the test case to match the column names in the file.
Verifying this change
This change added tests and can be verified as follows:
- Read and write ORC file with ZSTD compression
- Read and write ORC file with encryption &mask key configuration
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes)
- The public API, i.e., is any changed class annotated with
@Public(Evolving)
: (no) - The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
- The S3 file system connector: (no)
Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
CI report:
- f3bf29e2b2910bfd62da0702397da6b7a9034e24 UNKNOWN
- 074b0e3263647e65c0f3b3bf26117bd48c5ee977 UNKNOWN
- aafe8a07c5b0722220d895b13248fc18bc82b6b8 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:-
@flinkbot run azure
re-run the last Azure build
@JingsongLi Could you help review this pr?
I have submitted two pr to ORC community. ORC-1200: Extracting encryption setup logic from WriterImpl ORC-1198: Add a new PhysicalFsWriter constructor with FSDataOutputStream parameter
I will refactor this part of the code after they are merged.
@lirui-apache Please take a look.
For a record, to reviewers, ORC-1198 is already shipped via Apache ORC 1.7.5 and included in this PR.
Could you review this when you have some time, @mbalassi and @gyfora ? :)
Sure, we will take a look :)
Personally, I'd like to recommend you to remove Encryption part from this PR completely.
Since the encryption is not part of the official Apache ORC, +1 for removing this from the PR
Also, cc @williamhyun since he works as a release manager of Apache ORC 1.8.0.
@dongjoon-hyun @MartijnVisser Encryption part has been removed from this PR.
Hi, could you review this once more, @mbalassi , @gyfora , @morhidi , @MartijnVisser ?
@liujiawinds are you still working on this one? Happy to take over if its up for grabs
@pgaref Feel free to take over this.
Thank you, @pgaref and @liujiawinds .
Surpassed by https://github.com/apache/flink/pull/22481