parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

ParquetOutputFormat should support custom OutputCommitter

Open asfimport opened this issue 9 years ago • 1 comments

ParquetOutputFormat should support custom OutputCommitter.

There is a need to bypass current Hadoop functionality of writing output data under _temporary folder. Especially with AWS S3, there can be huge overhead of moving the files from _temporary folder to output folder.

Reporter: Mikko Kupsu Assignee: Steve Loughran / @steveloughran

Related issues:

PRs and other links:

Note: This issue was originally created as PARQUET-781. Please see the migration documentation for further details.

asfimport avatar Nov 23 '16 13:11 asfimport

Steve Loughran / @steveloughran: The strategy I propose for this is straightforward



change type of committer field to  OutputCommitter
if (jobConf.get("option to use path output committer", false) {
  outputCommitter =  call super.getOutputCommitter() 
  if  ParquetOutputFormat.getJobSummaryLevel(configuration) != None, log at warn and continue

There shouldn't be any need to do reflection games.

asfimport avatar May 30 '24 12:05 asfimport