datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support Utf8View and BinaryView in substrait serialization.

Open wiedld opened this issue 1 year ago • 0 comments

Which issue does this PR close?

Closes #12118

Rationale for this change

We have two new view data types, Utf8View and BinaryView. Support in datafusion is part of this epic, and this specific PR is about adding support for the (de-)serialization of logical and physical plans into the substrait format.

This PR adds new substrait variations on existing type classes. For example, there is a "string" substrait class which can have different variations representing different physical types (e.g. Utf8 vs LargeUtf8 vs Utf8View). If we serialize using string variation=2 (e.g. view physical type), then the deserialization of variation=2 will give us back the Utf8View. More background is given here.

What changes are included in this PR?

  • feat(12118): logical plan support for Utf8View (d7be771eb)
  • feat(12118): physical plan support for Utf8View (b17ae25a7)
  • feat(12118): logical plan support for BinaryView (f38085d4c)
  • feat(12118): physical plan support for BinaryView (5c4ebec5c)

Are these changes tested?

Logical plan: The Utf8View and BinaryView are covered in the logical plan roundtrip serialization tests.

Physical plan: However, the physical plan roundtrip serialization tests are not yet implemented. There is an ongoing epic to finish the physical plan serialization. As such, I added code for the physical plan substrait handling of Utf8View and BinaryView (to avoid incurring more tech debt) -- but this code is not tested.

Are there any user-facing changes?

No API contract change. Removal of unimplemented errors if using these new datatypes in subtrait serialization.

wiedld avatar Aug 27 '24 18:08 wiedld