flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-29126][hive] Fix wrong logic for spliting file for orc format

Open luoyuxia opened this issue 3 years ago • 1 comments

What is the purpose of the change

To fix the file split optimizatioin doesn't work for orc format.

Brief change log

  • Change"orc".equalsIgnoreCase(serializationLib) to !serializationLib.toLowerCase().contains("orc") to check whether it's orc format or not, since the serializationLib for orc format is actually OrcSerde .

  • optimization that enables multiple threads to calcalute total files' size .

Verifying this change

UT in HiveSourceFileEnumeratorTest.java

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

luoyuxia avatar Aug 29 '22 03:08 luoyuxia

CI report:

  • c391d6a3779a54a3de268ce1ca40768559715975 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Aug 29 '22 03:08 flinkbot

@godfreyhe I have addressed your comments. Could you please help review again when you're free.

luoyuxia avatar Sep 28 '22 06:09 luoyuxia