parquet-java
parquet-java copied to clipboard
Use FixedSizeBinary instead of Binary for int96 conversion when convertInt96ToArrowTimestamp is false
Describe the enhancement requested
public TypeMapping convertINT96(PrimitiveTypeName primitiveTypeName) throws RuntimeException {
if (convertInt96ToArrowTimestamp) {
return field(new ArrowType.Timestamp(TimeUnit.NANOSECOND, null));
} else {
return field(new ArrowType.Binary());
}
}
When converting a Parquet type to an Arrow type, if the original type is int96 and the option convertInt96ToArrowTimestamp is set to false, the resulting Arrow type defaults to Binary. However, it might be more appropriate to use FixedSizeBinary instead.
Component(s)
No response
I agree that it would be more appropriate to return a FixedSizeBinary, let's ask @wgtmac, since he's the expert on Arrow. Just to make sure that there is no historical reason to return Binary instead.
Ps. Keep in mind that INT96 is deprecated.
Thanks for pinging me @Fokko!
I agree that FixedSizeBinary is more appropriate than Binary. However, I would argue that it is invalid to use INT96 for non-timestamp type. So I think it is better to ignore convertInt96ToArrowTimestamp and directly return Timestamp.