delta icon indicating copy to clipboard operation
delta copied to clipboard

[Bug] Output parsed stats for delta lake tables

Open sclmn opened this issue 2 years ago • 5 comments

Bug

Currently, if delta.checkpoint.writeStatsAsStruct is set to true, the output contains parsed partition values but does not include parsed stats.

I think the code includes just parsed partition values right now but no support for parsed stats is present.

Would it be possible to add stats_parsed?

Motivation

The protocol states:

stats_parsed: The stats can be stored in their original format. This field needs to be written when statistics are available and the table property: delta.checkpoint.writeStatsAsStruct is set to true. When this property is set to false (which is the default), this field should be omitted from the checkpoint.

sclmn avatar Aug 23 '23 23:08 sclmn

@sclmn are you saying that when delta.checkpoint.writeStatsAsStruct is true, delta-spark is not writing out the stats_parsed field in the delta checkpoint? That seems like a bug. Thanks for pointing this out!

scottsand-db avatar Aug 31 '23 17:08 scottsand-db

@prakharjain09 can you take a look?

scottsand-db avatar Aug 31 '23 17:08 scottsand-db

Hi, I just wanted to check whether you have an update?

sclmn avatar Oct 04 '23 22:10 sclmn

I checked this and this seems like a bug.

prakharjain09 avatar Oct 04 '23 23:10 prakharjain09

@scottsand-db, @prakharjain09, @sclmn, do you have any updates on this?

felipepessoto avatar Sep 19 '24 00:09 felipepessoto

Related: #1719

felipepessoto avatar Nov 05 '24 00:11 felipepessoto

+1 from me. I think writing stats_parsed would be very useful.

Tom-Newton avatar Nov 20 '24 00:11 Tom-Newton