parquet-java
parquet-java copied to clipboard
HadoopInputFile to pass down FileStatus when opening file
Rationale for this change
- Saves overhead of HTTP head request when opening a file
- tells the hadoop FS client that the file being opened is parquet, and should use the first recognized policy of "parquet, columnar, vector, random". These can disable prefetch and limit ranges requested to those optimal for columns.
What changes are included in this PR?
1,. Uses reflection to load reflection-friendly bindings to the enhanced openFile method of https://github.com/apache/hadoop/pull/6686 . Although openFile() has been present since Hadoop 3.3.0, because parquet still builds against hadoop 2.x reflection is required.
Are these changes tested?
Existing tests have been modified. https://github.com/apache/hadoop/pull/6686https://github.com/apache/hadoop/pull/6686
Are there any user-facing changes?
no
Closes #${#2915}