planetiler icon indicating copy to clipboard operation
planetiler copied to clipboard

[BUG] Parsing Exception for FIXED_LEN_BYTE_ARRAY Data in Parquet File

Open CrazyBug-11 opened this issue 1 year ago • 1 comments

In my dataset, there is a shape_area field defined as follows: image

During parsing, I found that the data values become excessively large. For example: The original value 173.24927660400 is parsed as 1.73249276604E24.

After investigating the code, I found an issue in the ParquetPrimitiveConverter class on line 102, where the scale is negated:

int scale = -decimal.getScale(); When I modified the code to use int scale = decimal.getScale();, the parsed data values were correct.

I would like to understand if there is any specific reason for negating the scale (-decimal.getScale())? Does it serve any special purpose, or is it a mistake?

CrazyBug-11 avatar Nov 19 '24 12:11 CrazyBug-11

int scale = -decimal.getScale();

Comes from this section of the spec:

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

It might be reversed though, would be good to confirm how that field gets interpreted by another tool to be sure.

msbarry avatar Nov 23 '24 15:11 msbarry