morphir-elm icon indicating copy to clipboard operation
morphir-elm copied to clipboard

Support transpilation of Antique.elm's christmas bonanza example in Spark

Open rdale opened this issue 2 years ago • 6 comments

i.e. the function https://github.com/finos/morphir-elm/blob/bd523b6d2b870db0b81068f954ac475262f61189/tests-integration/reference-model/src/Morphir/Reference/Model/Sample/Rules/Income/Antique.elm#L110

It uses List.minimum, List.maximum, and instead of returning a boolean for each individual Antique, takes the whole list of Antiques and returns a PriceRange.

rdale avatar Jul 05 '22 15:07 rdale

This issue depends on #793 and #794

jonathanmaw avatar Jul 05 '22 15:07 jonathanmaw

From an example of

testTuple : List { ageOfItem : Float } -> (Float, Maybe Float)
testTuple antiques =
    ( antiques |> List.map .ageOfItem |> List.sum
    , antiques |> List.map .ageOfItem |> List.minimum
    )

I currently have generated code

  def testTuple(
    antiques: org.apache.spark.sql.DataFrame
  ): org.apache.spark.sql.DataFrame =
    antiques.select(
      org.apache.spark.sql.functions.sum(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem"),
      org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem")
    )

This is a problem, since the automatically-generated alias for the expression will be a duplicate whenever different aggregations are performed on the same field.

I think the sensible thing to do here would be to drop the .alias() statement in the output, which means making it optional in Morphir's Spark transpiler and ensuring it's not set when generated from a tuple.

jonathanmaw avatar Sep 08 '22 13:09 jonathanmaw

The christmas bonanza function is a bit more complex than a simple tuple. The key difference is it's an Apply` of a Lambda of a tuple to a source relation, i.e.

source
  |> List.map .ageofItem
  |> (\a ->
        ( List.minimum a
        , List.maximum a
        )

As Spark, this would resemble

source.select(
  min(col("ageOfItem"),
  max(col("ageOfItem")
)

If I can inline the parts outside of the lambda, the elm code becomes

( source |> List.map .ageOfItem |> List.minimum
, source |> List.map .ageOfItem |> List.maximum
)

which is something I have already solved.

jonathanmaw avatar Sep 09 '22 17:09 jonathanmaw

The process for inlining this appears to be: Match against Value.Apply _ ((Value.Lambda _ lambdaArg (Value.Tuple _ relations) ) as lam) sourceValue and call

relations
    |> List.map (inlineArguments (collectLambdaParams lam []) sourceValue)

to inline the sourceValue into the relations. Then I can treat it like my regular Tuple handling.

jonathanmaw avatar Sep 09 '22 17:09 jonathanmaw

Inlining seemed to work for the simple case of

testApplyTuple : List { ageOfItem : Float } -> (Float, Maybe Float)
testApplyTuple antiques =
    antiques
        |> List.map .ageOfItem
        |> (\ages ->
                ( List.sum ages
                , List.minimum ages
                )
           )

becoming

  def testApplyTuple(
    antiques: org.apache.spark.sql.DataFrame
  ): org.apache.spark.sql.DataFrame =
    antiques.select(
      org.apache.spark.sql.functions.sum(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem"),
      org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem")
    )

However, it's not working for something as complex as the christmas bonanza example. I'm not sure if inlineLetDef is producing the desired results, or I'm confused about what it does. I know that inlineLetDef is being called on it, but Value.Apply appears to still be used on a Value.Variable instead of a block of code. The literal 0.0, which comes from inside the LetDefinition is visible in the body of the code after inlineLetDef is called, so I think the code has been inserted, but I don't know what the Value.Variables are doing there now.

jonathanmaw avatar Sep 12 '22 16:09 jonathanmaw

Having taken some time to take the error message apart and see what's in the IR, I noticed that it was multiplying priceValue by getPriceValue, and the value of the bonanzaDiscount FloatLiteral 0.15 was completely absent.

That led me to suspect inlineLetDef is inlining values into the wrong place. I created

testLetDef : List AntiqueSubset -> List AntiqueSubset
testLetDef source =
    let
        max = 20.0
        min = 10.0
    in
    source
        |> List.filter
            (\antique ->
                (antique.ageOfItem <= max) && (antique.ageOfItem >= min)
            )

and this gets transpiled into

  def testLetDef(
    source: org.apache.spark.sql.DataFrame
  ): org.apache.spark.sql.DataFrame =
    source.filter(((org.apache.spark.sql.functions.col("ageOfItem")) <= (10)) and ((org.apache.spark.sql.functions.col("ageOfItem")) >= (20)))

i.e. it goes from the condition of ageOfItem being <= 20.0 and >= 10.0, to <= 10 and >= 20.

jonathanmaw avatar Sep 13 '22 13:09 jonathanmaw