datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Implement withField and dropField for struct types

Open andygrove opened this issue 1 year ago • 4 comments

What is the problem the feature request solves?

See documenttion for more details:

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.withField.html

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.dropFields.html

Describe the potential solution

No response

Additional context

No response

andygrove avatar Aug 12 '24 17:08 andygrove

take

dharanad avatar Aug 13 '24 18:08 dharanad

I believe these are purely used for analysis, and end up just becoming named_struct expressions in the physical plan, so they're probably already supported

Kimahriman avatar Aug 14 '24 13:08 Kimahriman

My understanding is the same as @Kimahriman that both of these are implemented in terms of UpdateFields which are replaced in the spark Analyzer by the rule ReplaceUpdateFieldsExpression https://github.com/apache/spark/blob/v3.5.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala#L79-L87 and based on https://github.com/apache/spark/blob/v3.5.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L746-L757 it looks like it is replace by CreateNamedStruct expression and should therefore be supported.

eejbyfeldt avatar Aug 20 '24 07:08 eejbyfeldt

It sounds like we just need to add tests and update documentation. I have added this to the 0.8.0 milestone.

andygrove avatar Apr 03 '25 13:04 andygrove