parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

HadoopInputFile to pass down FileStatus when opening file

Open steveloughran opened this issue 7 months ago • 0 comments

Rationale for this change

  • Saves overhead of HTTP head request when opening a file
  • tells the hadoop FS client that the file being opened is parquet, and should use the first recognized policy of "parquet, columnar, vector, random". These can disable prefetch and limit ranges requested to those optimal for columns.

What changes are included in this PR?

1,. Uses reflection to load reflection-friendly bindings to the enhanced openFile method of https://github.com/apache/hadoop/pull/6686 . Although openFile() has been present since Hadoop 3.3.0, because parquet still builds against hadoop 2.x reflection is required.

Are these changes tested?

Existing tests have been modified. https://github.com/apache/hadoop/pull/6686https://github.com/apache/hadoop/pull/6686

Are there any user-facing changes?

no

Closes #${#2915}

steveloughran avatar Jul 15 '24 18:07 steveloughran