parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

ParquetWriter::close sometimes fail

Open asfimport opened this issue 1 year ago • 1 comments

we sometimes run into an exception when closing a ParquetWriter instance:

 


2024-06-10 10:44:01.398    org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource
2024-06-10 10:44:01.398        at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85)
2024-06-10 10:44:01.398        at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94)
2024-06-10 10:44:01.398        at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:144)
2024-06-10 10:44:01.398        at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:437)
2024-06-10 10:44:01.398    Caused by: java.nio.channels.ClosedChannelException: null
2024-06-10 10:44:01.398        at org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73)
2024-06-10 10:44:01.398        at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:158)
2024-06-10 10:44:01.398        at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:639)
2024-06-10 10:44:01.398        at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:594)
2024-06-10 10:44:01.398        at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:136)
2024-06-10 10:44:01.398        at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:65)
2024-06-10 10:44:01.398        at org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1663)
2024-06-10 10:44:01.398        at org.apache.parquet.util.AutoCloseables.close(AutoCloseables.java:49)
2024-06-10 10:44:01.398        at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:83)

Reporter: Roelof Naude

Note: This issue was originally created as PARQUET-2496. Please see the migration documentation for further details.

asfimport avatar Jun 10 '24 11:06 asfimport

have spend more time on this. the issue is due to the double close called from:

  1. ParquetFileWriter::end (the finally block calls close)
  2. ParquetFileWriter::close tries to close PositionOutputStream using the try-with-resources block.
  3. InternalParquetRecordWriter::close calls ParquetFileWriter::end and in finally block AutoCloseables.uncheckedClose, which calls ParquetFileWriter::close the 2nd time.

looks like ParquetFileWriter::close might require protection against double close.

naude-r avatar Jul 17 '24 07:07 naude-r