presto
presto copied to clipboard
Add support for symlink files in quick stats
Description
Closes #25199.
Motivation and Context
Makes quick stats work for Hive tables defined with a symlink file containing a list of paths where parquet files are stored. Currently, quick stats fails for symlink files because it expects only regular files in the directory, thus failing for a symlink file (which contains a listing of regular files) This changes reads the symlink file and follow the paths in it to then get the parquet data file paths for quick stats.
Test Plan
Testing was done an external hive table with the following storage format in .prestoSchema
"storageFormat" : {
"serDe" : "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
"inputFormat" : "org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat",
"outputFormat" : "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
},
The externalLocation contains a manifest file with the list of parquet paths to follow.
Contributor checklist
- [x] Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
- [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
- [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality.
- [x] If release notes are required, they follow the release notes guidelines.
- [x] Adequate tests were added if applicable.
- [x] CI passed.
Release Notes
Please follow release notes guidelines and fill in the release notes below.
== RELEASE NOTES ==
Hive Connector Changes
* Add support for symlink files in :ref:`connector/hive:Quick Stats`.