morphir-elm icon indicating copy to clipboard operation
morphir-elm copied to clipboard

Implement transpilation of List.minimum in Spark

Open jonathanmaw opened this issue 2 years ago • 6 comments

jonathanmaw avatar Jul 05 '22 15:07 jonathanmaw

This appears to be a combination of List.map functionality and aggregation support. Taking the christmas bonanza example and reducing it to its simplest is:

source
    |> List.map .priceValue
    |> (\listOfPrices ->
            [{ foo = List.minimum listOfPrices
            }]
       )

(as every generated spark function ultimately returns a dataframe, I generate a List of Records in the lambda instead of a tuple)

jonathanmaw avatar Jul 06 '22 13:07 jonathanmaw

An equivalent in Spark would be:

source.select(
    org.apache.spark.sql.functions.min(
        org.apache.spark.sql.functions.col("ageOfItem")
    ).alias("foo")
)

Which in Spark IR would be

Select
    [ ( Name "min" ),(
        Function
            "min"
            [Column "ageOfItem"]
    )
    ]
    (From (ObjectName "source"))

jonathanmaw avatar Jul 06 '22 13:07 jonathanmaw

I think objectExpressionFromValue would need to match against a Lambda being called on an application of List.map, which all gets presented as a Select, to implement this.

jonathanmaw avatar Jul 06 '22 14:07 jonathanmaw

I wonder if considering a general aggregation case might be instructive, e.g. As Elm:

source = [{cost=1, weight=2}, {cost=3, weight=4}]
testMultiple source =
    [{
        lightest =
            source |> List.map .weight |> List.minimum,
        dearest =
            source |> List.map .cost |> List.maximum
    }]

As Spark:

source.select(
    org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.col("weight")).alias("lightest"),
    org.apache.spark.sql.functions.max(org.apache.spark.sql.functions.col("cost")).alias("dearest"),
)

jonathanmaw avatar Jul 06 '22 14:07 jonathanmaw

After a lot of headscratching, I think I've gotten closer to a working implementation.

I need to match against Apply _ (Lambda _ _ (List _ [Record _ fields])) sourceRelation, and if objectExpressionFromValue ir sourceRelation produces a Select with only one NamedExpression in it, create a new Select, which uses the Expression inside that NamedExpression as the argument to my "min" function.

i.e., expect

Select [ _,subExpression ] sourceExpression

jonathanmaw avatar Jul 08 '22 18:07 jonathanmaw

I have the pull request https://github.com/finos/morphir-elm/pull/804 for this

jonathanmaw avatar Jul 19 '22 15:07 jonathanmaw