delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Fix parsing null counts for struct type columns in the struct stats

Open Tom-Newton opened this issue 2 years ago • 0 comments

Description

When reading the struct column stats (as opposed to json column stats) the null counts where ignored for struct type columns. Completely ignoring is probably less bad than parsing them wrongly but if the user was filtering on a struct column this may have impacted performance. It was also spamming error logs:

[2022-07-26T19:33:52Z ERROR deltalake::action] Expect type of stats_parsed.nullRecords value to be struct, got: {integer: 0, null: 1, boolean: 0, double: 0, decimal: 0, string: 0, binary: 0, date: 0, timestamp: 0, struct: {struct_element: 0}, map: 0, array: 0, nested_struct: {struct_element: {nested_struct_element: 0}}, struct_of_array_of_map: {struct_element: 0}}

I probably should have done this as part of https://github.com/delta-io/delta-rs/pull/656. Sorry for introducing this slight regression. I think some logging was getting suppressed during the unittest so I didn't notice it was an issue.

Related Issue(s)

Relates to #653 but that was for the most part already solved by https://github.com/delta-io/delta-rs/pull/656

Changes

  • In the test assert that all stats are the same not just the min_values stats. This prevents a similar mistake in future.
  • Updated logic to handle struct types when parsing null_counts from struct stats.

Tom-Newton avatar Jul 26 '22 19:07 Tom-Newton