delta icon indicating copy to clipboard operation
delta copied to clipboard

[Kernel] Add widening type conversions to Kernel default parquet reader

Open johanl-db opened this issue 6 months ago • 0 comments

Which Delta project/connector is this regarding?

  • [ ] Spark
  • [ ] Standalone
  • [ ] Flink
  • [x] Kernel
  • [ ] Other (fill in here)

Description

Add a set of conversions to the default parquet reader provided by kernel to allow reading columns using a wider type than the actual in the parquet file. This will support the type widening table feature, see https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md.

Conversions added:

  • INT32 -> long
  • FLOAT -> double
  • decimal precision/scale increase
  • DATE -> timestamp_ntz
  • INT32 -> double
  • integers -> decimal

How was this patch tested?

Added tests covering all conversions in ParquetColumnReaderSuite

Does this PR introduce any user-facing changes?

This change alone doesn't allow reading Delta table that use the type widening table feature. That feature is still unsupported. It does allow reading Delta tables that somehow have Parquet files that contain types that are different from the table schema, but that really should never happen for tables that don't support type widening..

johanl-db avatar Aug 13 '24 15:08 johanl-db