flink icon indicating copy to clipboard operation
flink copied to clipboard

[WIP] [FLINK-28773][hive] Fix HiveTableSink failed to report statistic to hive metastore

Open luoyuxia opened this issue 3 years ago • 1 comments

What is the purpose of the change

To fix the issue that HiveTableSInk fails to report statistic to Hive metastore.

Brief change log

  • When commit non-partition table/partition, try to collect statistic from files in non-partition table/partition, and then put these statistic to metastore.
  • When collect statistic from files, try to get RecordReader according to table's input format. If the RecordReader is intance of StatsProvidingRecordReader, which means it's orc or parquet format, we collect the statistic numRows, fileSize, numFiles, rawDataSize. If not, we only collect fileSize and numFiles since other statistic is hard to collect efficiently.

Verifying this change

Added UT.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

luoyuxia avatar Aug 11 '22 09:08 luoyuxia

CI report:

  • d7324785828f1b51b77a190c25b0bd225c141caa Azure: FAILURE
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Aug 11 '22 09:08 flinkbot

@flinkbot run azure

luoyuxia avatar Aug 22 '22 03:08 luoyuxia

@flinkbot run azure

luoyuxia avatar Aug 22 '22 04:08 luoyuxia