morphir-elm icon indicating copy to clipboard operation
morphir-elm copied to clipboard

Handle functions with datatypes other than Lists of Records

Open jonathanmaw opened this issue 2 years ago • 2 comments

The Spark backend currently only handles functions that have Lists as arguments (https://github.com/finos/morphir-elm/blob/5c0c516169914792291f68f7e99a25776009b160/src/Morphir/Spark/Backend.elm#L219).

This is because it is assumed that every Spark function takes a dataframe as an argument and returns a dataframe.

Real-world usage of this doesn't expect to always take and return a List of Records, however. The Christmas Bonanza example (https://github.com/finos/morphir-elm/blob/5c0c516169914792291f68f7e99a25776009b160/tests-integration/reference-model/src/Morphir/Reference/Model/Sample/Rules/Income/Antique.elm#L110) takes a List of Antiques and returns a PriceRange (a tuple of two prices)

The suggested solution to this is to instead of treating any other value as an error, attempt to construct a full function (including converting input from whatever input type into a dataframe, and output from whatever returned type into a dataframe) and only return an error if this isn't possible.

jonathanmaw avatar Jul 11 '22 16:07 jonathanmaw

This is required to write idiomatic elm aggregations, e.g.

source
    |> List.map .ageOfItem
    |> List.minimum

jonathanmaw avatar Jul 22 '22 14:07 jonathanmaw

I've had a look deeper into the code for datatypes other than lists of records, and there seem to be no restrictions specific to returning a List of Records. i.e. while mapFunctionDefinition in src/Morphir/Spark/Backend.elm restricts the input types to Lists only, the return type of the function is not used, and the function declaration forces the return type to be a Spark DataFrame regardless of what the return type was in the IR.

I've identified some use-cases:

Elm-style Aggregations

(see also #844) i.e. code of the form

source
    |> List.map .fieldName
    |> List.sum

And the corresponding Spark code is

source.select(sum(col("fieldName")).alias("fieldName"))

Fortunately, the List.map expression source |> List.map .fieldName produces an ObjectExpression Select ["fieldName",Function "sum" [...] ] source.

The solution to implementing this seems to be:

  • Refactor mapSDKFunctions so that the args could be a list of Expressions, not just TypedValues
  • When objectExpressionFromValue receives a function that returns a single basic type (i.e. Int, Float, String, Bool) or a Maybe of one of those types
  • Try to create an ObjectExpression from the arguments to that function.
  • If that ObjectExpression is a Select with only one argument, extract the Name and Expression, plus the Source
  • Then use mapSDKFunctions, using the root function, with the Expression as args, and construct a new Select ObjectExpression with our new Expression, the Name and the Source.

Aggregations grouped in a tuple

See also #792

  • When objectExpressionFromValue receives a tuple constructor, create ObjectExpressions for every value inside the tuple constructor.
  • Apply the constraint that the created ObjectExpressions are only Selects with one NamedExpression, and they all have identical Sources.
  • Use all the NamedExpressions as the NamedExpressions inside a new Select with the Source they all share.

note: I'm not sure at this point how to handle filter expressions in here, too.

Aggregations grouped in a Record

There are no issues or examples for this use case, but it's provided for completeness.

The process is identical to the tuple case except we also have our own names to replace for each of the NamedExpressions.

Aggregations grouped in a singleton List of a Record

AKA the Pedantic Elm case. This has already been implemented, but could be simplified and made more readable following this.

jonathanmaw avatar Aug 12 '22 15:08 jonathanmaw