morphir-elm Handle functions with datatypes other than Lists of Records

The Spark backend currently only handles functions that have Lists as arguments (https://github.com/finos/morphir-elm/blob/5c0c516169914792291f68f7e99a25776009b160/src/Morphir/Spark/Backend.elm#L219).

This is because it is assumed that every Spark function takes a dataframe as an argument and returns a dataframe.

Real-world usage of this doesn't expect to always take and return a List of Records, however. The Christmas Bonanza example (https://github.com/finos/morphir-elm/blob/5c0c516169914792291f68f7e99a25776009b160/tests-integration/reference-model/src/Morphir/Reference/Model/Sample/Rules/Income/Antique.elm#L110) takes a List of Antiques and returns a PriceRange (a tuple of two prices)

The suggested solution to this is to instead of treating any other value as an error, attempt to construct a full function (including converting input from whatever input type into a dataframe, and output from whatever returned type into a dataframe) and only return an error if this isn't possible.

Jul 11 '22 16:07 jonathanmaw

This is required to write idiomatic elm aggregations, e.g.

source
    |> List.map .ageOfItem
    |> List.minimum

Jul 22 '22 14:07 jonathanmaw

I've had a look deeper into the code for datatypes other than lists of records, and there seem to be no restrictions specific to returning a List of Records. i.e. while mapFunctionDefinition in src/Morphir/Spark/Backend.elm restricts the input types to Lists only, the return type of the function is not used, and the function declaration forces the return type to be a Spark DataFrame regardless of what the return type was in the IR.

I've identified some use-cases:

Elm-style Aggregations

(see also #844) i.e. code of the form

source
    |> List.map .fieldName
    |> List.sum

And the corresponding Spark code is

source.select(sum(col("fieldName")).alias("fieldName"))

Fortunately, the List.map expression source |> List.map .fieldName produces an ObjectExpression Select ["fieldName",Function "sum" [...] ] source.

The solution to implementing this seems to be:

Refactor mapSDKFunctions so that the args could be a list of Expressions, not just TypedValues
When objectExpressionFromValue receives a function that returns a single basic type (i.e. Int, Float, String, Bool) or a Maybe of one of those types
Try to create an ObjectExpression from the arguments to that function.
If that ObjectExpression is a Select with only one argument, extract the Name and Expression, plus the Source
Then use mapSDKFunctions, using the root function, with the Expression as args, and construct a new Select ObjectExpression with our new Expression, the Name and the Source.

Aggregations grouped in a tuple

Aggregations grouped in a Record

There are no issues or examples for this use case, but it's provided for completeness.

The process is identical to the tuple case except we also have our own names to replace for each of the NamedExpressions.

Aggregations grouped in a singleton List of a Record

AKA the Pedantic Elm case. This has already been implemented, but could be simplified and made more readable following this.

Aug 12 '22 15:08 jonathanmaw

morphir-elm morphir-elm copied to clipboard

Handle functions with datatypes other than Lists of Records

Elm-style Aggregations

Aggregations grouped in a tuple

Aggregations grouped in a Record

Aggregations grouped in a singleton List of a Record

morphir-elm
morphir-elm copied to clipboard