parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Use FixedSizeBinary instead of Binary for int96 conversion when convertInt96ToArrowTimestamp is false

Open doki23 opened this issue 1 year ago • 2 comments

Describe the enhancement requested

public TypeMapping convertINT96(PrimitiveTypeName primitiveTypeName) throws RuntimeException {
  if (convertInt96ToArrowTimestamp) {
    return field(new ArrowType.Timestamp(TimeUnit.NANOSECOND, null));
  } else {
    return field(new ArrowType.Binary());
  }
}

When converting a Parquet type to an Arrow type, if the original type is int96 and the option convertInt96ToArrowTimestamp is set to false, the resulting Arrow type defaults to Binary. However, it might be more appropriate to use FixedSizeBinary instead.

Component(s)

No response

doki23 avatar Nov 29 '24 07:11 doki23

I agree that it would be more appropriate to return a FixedSizeBinary, let's ask @wgtmac, since he's the expert on Arrow. Just to make sure that there is no historical reason to return Binary instead.

Ps. Keep in mind that INT96 is deprecated.

Fokko avatar Dec 02 '24 06:12 Fokko

Thanks for pinging me @Fokko!

I agree that FixedSizeBinary is more appropriate than Binary. However, I would argue that it is invalid to use INT96 for non-timestamp type. So I think it is better to ignore convertInt96ToArrowTimestamp and directly return Timestamp.

wgtmac avatar Dec 02 '24 08:12 wgtmac