datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Allow logical expressions to express a cast to an extension type

Open paleolimbot opened this issue 2 months ago • 1 comments

Which issue does this PR close?

  • Closes #18060.

I am sorry that I missed the previous PR implementing this ( https://github.com/apache/datafusion/pull/18120 ) and I'm also happy to review that one instead of updating this!

Rationale for this change

Other systems that interact with the logical plan (e.g., SQL, Substrait) can express types that are not strictly within the arrow DataType enum.

What changes are included in this PR?

For the Cast and TryCast structs, the destination data type was changed from a DataType to a FieldRef.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, any code using Cast { .. } to create an expression would need to use Cast::new() instead (or pass on field metadata if it has it). Existing matches will need to be upated for the data_type -> field member rename.

paleolimbot avatar Oct 17 '25 17:10 paleolimbot

@alamb If updating DataTypes to FieldRefs is still what we're doing in the logical plan, this PR is ready for review!

paleolimbot avatar Dec 02 '25 00:12 paleolimbot

This makes sense in general! I only got partially into the review before realizing it conflicts with https://github.com/apache/datafusion/pull/19097. Maybe that one is the issue to focus on?

adriangb avatar Dec 11 '25 20:12 adriangb

Thank you for reviewing!

I didn't see that PR but it looks like it focuses on the physical cast. This PR doesn't really have anything to do with the physical cast (in order to actually execute a cast to an extension type we need a registry of how exactly that should be done, which we don't have yet), although I'm sure there's some merge conflict between them.

paleolimbot avatar Dec 11 '25 21:12 paleolimbot

Oh right two different layers! But they're both essentially adding Field to the cast operators, which is interesting / why I got confused.

I will try to take another look here tomorrow.

adriangb avatar Dec 11 '25 22:12 adriangb

Is there an example of actually customizing casting, or an issue tracking getting us there?

There's an issue to create a registry for dyn Extensiony things and a POC PR with not much discussion driving a DataFusion-based solution ( https://github.com/apache/datafusion/issues/18223 )...I added a comment there on how we might go about that using the registry. This exact PR allows DataFusion-based projects to implement a workaround (for example, in SedonaDB I'm planning to transform the logical plan to rewrite casts to an extension type to a scalar function call to sd_cast() ). Casts show up in SQL -> LogicalPlan internals and SQL types are now FieldRefs too, so my personal next step was to see if we now have enough to add a UUID type in SQL.

paleolimbot avatar Dec 16 '25 18:12 paleolimbot