flink
flink copied to clipboard
[WIP] [FLINK-28773][hive] Fix HiveTableSink failed to report statistic to hive metastore
What is the purpose of the change
To fix the issue that HiveTableSInk fails to report statistic to Hive metastore.
Brief change log
- When commit non-partition table/partition, try to collect statistic from files in non-partition table/partition, and then put these statistic to metastore.
- When collect statistic from files, try to get
RecordReaderaccording to table's input format. If theRecordReaderis intance ofStatsProvidingRecordReader, which means it's orc or parquet format, we collect the statisticnumRows,fileSize,numFiles,rawDataSize. If not, we only collectfileSizeandnumFilessince other statistic is hard to collect efficiently.
Verifying this change
Added UT.
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (yes / no) - The serializers: (yes / no / don't know)
- The runtime per-record code paths (performance sensitive): (yes / no / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
- The S3 file system connector: (yes / no / don't know)
Documentation
- Does this pull request introduce a new feature? (yes / no)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
CI report:
- d7324785828f1b51b77a190c25b0bd225c141caa Azure: FAILURE
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@flinkbot run azure
@flinkbot run azure