flink
flink copied to clipboard
[FLINK-29126][hive] Fix wrong logic for spliting file for orc format
What is the purpose of the change
To fix the file split optimizatioin doesn't work for orc format.
Brief change log
-
Change
"orc".equalsIgnoreCase(serializationLib)to!serializationLib.toLowerCase().contains("orc")to check whether it's orc format or not, since the serializationLib for orc format is actuallyOrcSerde. -
optimization that enables multiple threads to calcalute total files' size .
Verifying this change
UT in HiveSourceFileEnumeratorTest.java
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (yes / no) - The serializers: (yes / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes / no / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
- The S3 file system connector: (yes / no / don't know)
Documentation
- Does this pull request introduce a new feature? (yes / no)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
CI report:
- c391d6a3779a54a3de268ce1ca40768559715975 Azure: SUCCESS
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@godfreyhe I have addressed your comments. Could you please help review again when you're free.