parquet-format icon indicating copy to clipboard operation
parquet-format copied to clipboard

Deprecate RowGroup.file_offset

Open asfimport opened this issue 4 years ago • 5 comments

Due to PARQUET-2078 RowGroup.file_offset is not reliable.

This field is also wrongly calculated in the C++ oss parquet implementation PARQUET-2089

Reporter: Gabor Szadovszky / @gszadovszky Assignee: Gidon Gershinsky / @ggershinsky

Note: This issue was originally created as PARQUET-2080. Please see the migration documentation for further details.

asfimport avatar Aug 30 '21 13:08 asfimport

Gabor Szadovszky / @gszadovszky: @ggershinsky, however the original topic of this jira is invalid we still need to add proper comments to RowGroup.file_offset describing the situation of PARQUET-2078 and helping the implementations to handle the potential wrong value. Would you like to handle this?

asfimport avatar Sep 13 '21 09:09 asfimport

Gidon Gershinsky / @ggershinsky: @gszadovszky  yes, I'll take it. There might be a different solution (also format-related) that bypasses the need to calculate such parameter in any implementation, so it can be fully deprecated. I'll get back with the details and we'll discuss the trade-offs.

asfimport avatar Sep 13 '21 09:09 asfimport

Gidon Gershinsky / @ggershinsky: Hi @gszadovszky , I've prepared a short writeup on this alternative solution, with a discussion of the tradeoffs. After writing it, my feeling is that the trade-off is not in favor of this alternative option; but here it goes, just to cover all bases. Will appreciate your opinion on this.

asfimport avatar Sep 28 '21 06:09 asfimport

Gabor Szadovszky / @gszadovszky: @ggershinsky, could you make the doc available for comments?

asfimport avatar Sep 28 '21 09:09 asfimport

Gidon Gershinsky / @ggershinsky: Oh, sorry, done.

asfimport avatar Sep 28 '21 11:09 asfimport