delta
delta copied to clipboard
[Bug] Output parsed stats for delta lake tables
Bug
Currently, if delta.checkpoint.writeStatsAsStruct is set to true, the output contains parsed partition values but does not include parsed stats.
I think the code includes just parsed partition values right now but no support for parsed stats is present.
Would it be possible to add stats_parsed?
Motivation
The protocol states:
stats_parsed: The stats can be stored in their original format. This field needs to be written when statistics are available and the table property: delta.checkpoint.writeStatsAsStruct is set to true. When this property is set to false (which is the default), this field should be omitted from the checkpoint.
@sclmn are you saying that when delta.checkpoint.writeStatsAsStruct is true, delta-spark is not writing out the stats_parsed field in the delta checkpoint? That seems like a bug. Thanks for pointing this out!
@prakharjain09 can you take a look?
Hi, I just wanted to check whether you have an update?
I checked this and this seems like a bug.
@scottsand-db, @prakharjain09, @sclmn, do you have any updates on this?
Related: #1719
+1 from me. I think writing stats_parsed would be very useful.