Add struct support for pyspark UDF map operations
Is your feature request related to a problem? Please describe. If you want to return multiple columns as an output for pyspark, you need to define a struct as an output. Currently Hamilton only handles python primitives, and thus only returning a single column from a UDF function.
Describe the solution you'd like For vanilla UDFs:
- the user should be able to specify a tuple of primitives as an output to a function, in addition to the names of the outputs.
- the framework should then handle creating the appropriate types for the UDF.
- the framework should then handle flattening these outputs, so they can more easily be used for downstream UDFs.
Describe alternatives you've considered Not doing this.
Additional context If pandas UDFs support map operations of the type series -> dataframe, then we should look into supporting that as well.
Feels like the same pattern as extract_columns, no?
Feels like the same pattern as
extract_columns, no?
We'll have to see how we could make it work with that yes.