delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Add counter information to written parquet files

Open rtyler opened this issue 4 years ago • 2 comments

Based on the comment by Serge the parquet file path generation could use some cleanup to track a counter, and rely on the parquet file's writer properties for determining whether to add a snappy or not.

rtyler avatar May 23 '21 19:05 rtyler

Based on Serge's comment, we also need to use the same uuid for all files from the same batch, otherwise the counter won't provide any value.

houqp avatar May 24 '21 01:05 houqp

https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ManifestFileCommitProtocol.scala#L115-L120 here's good enough reference on conventional filename formatting.

nfx avatar May 24 '21 17:05 nfx

I think this issue is already done with this PR. The writer function next_data_path takes care of correctly constructing the path for a Parquet file.

Jan-Schweizer avatar Nov 02 '23 09:11 Jan-Schweizer

You are correct @Jan-Schweizer - will close :)

roeap avatar Nov 05 '23 13:11 roeap