hive
hive copied to clipboard
HIVE-28268: Iceberg: Retrieve row count from iceberg SnapshotSummary in case of iceberg.hive.keep.stats=false
What changes were proposed in this pull request?
At present, in case of iceberg.hive.keep.stats=true
& hive.compute.query.using.stats=true
, HS2 will do a fetch task to get iceberg table's numRows
property from HMS to optimize count
query.
If iceberg.hive.keep.stats=false
, HS2 will always launch tez task to compute table's row count when filing a count
query.
However, as we know, iceberg table's metadata has some stats information, we can also just start a fetch task to retrieve the row count from iceberg's snapshot summary when iceberg.hive.keep.stats=false
or no stats stored in hms. This can avoid launching tez task to compute the table's row count.
BTW, timetravel or branch/tag has different stats from current snapshot, so we need to get the specified snapshotid based on the different iceberg version. Otherwise, we will get the wrong stats when querying the time travel/branch/tag.
Why are the changes needed?
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Qtest