morphir-elm
morphir-elm copied to clipboard
Support transpilation of Antique.elm's christmas bonanza example in Spark
i.e. the function https://github.com/finos/morphir-elm/blob/bd523b6d2b870db0b81068f954ac475262f61189/tests-integration/reference-model/src/Morphir/Reference/Model/Sample/Rules/Income/Antique.elm#L110
It uses List.minimum, List.maximum, and instead of returning a boolean for each individual Antique, takes the whole list of Antiques and returns a PriceRange.
This issue depends on #793 and #794
From an example of
testTuple : List { ageOfItem : Float } -> (Float, Maybe Float)
testTuple antiques =
( antiques |> List.map .ageOfItem |> List.sum
, antiques |> List.map .ageOfItem |> List.minimum
)
I currently have generated code
def testTuple(
antiques: org.apache.spark.sql.DataFrame
): org.apache.spark.sql.DataFrame =
antiques.select(
org.apache.spark.sql.functions.sum(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem"),
org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem")
)
This is a problem, since the automatically-generated alias for the expression will be a duplicate whenever different aggregations are performed on the same field.
I think the sensible thing to do here would be to drop the .alias()
statement in the output, which means making it optional in Morphir's Spark transpiler and ensuring it's not set when generated from a tuple.
The christmas bonanza function is a bit more complex than a simple tuple. The key difference is it's an Apply` of a Lambda of a tuple to a source relation, i.e.
source
|> List.map .ageofItem
|> (\a ->
( List.minimum a
, List.maximum a
)
As Spark, this would resemble
source.select(
min(col("ageOfItem"),
max(col("ageOfItem")
)
If I can inline the parts outside of the lambda, the elm code becomes
( source |> List.map .ageOfItem |> List.minimum
, source |> List.map .ageOfItem |> List.maximum
)
which is something I have already solved.
The process for inlining this appears to be:
Match against Value.Apply _ ((Value.Lambda _ lambdaArg (Value.Tuple _ relations) ) as lam) sourceValue
and call
relations
|> List.map (inlineArguments (collectLambdaParams lam []) sourceValue)
to inline the sourceValue into the relations. Then I can treat it like my regular Tuple handling.
Inlining seemed to work for the simple case of
testApplyTuple : List { ageOfItem : Float } -> (Float, Maybe Float)
testApplyTuple antiques =
antiques
|> List.map .ageOfItem
|> (\ages ->
( List.sum ages
, List.minimum ages
)
)
becoming
def testApplyTuple(
antiques: org.apache.spark.sql.DataFrame
): org.apache.spark.sql.DataFrame =
antiques.select(
org.apache.spark.sql.functions.sum(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem"),
org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.col("ageOfItem")).alias("ageOfItem")
)
However, it's not working for something as complex as the christmas bonanza example.
I'm not sure if inlineLetDef is producing the desired results, or I'm confused about what it does.
I know that inlineLetDef is being called on it, but Value.Apply appears to still be used on a Value.Variable instead of a block of code.
The literal 0.0
, which comes from inside the LetDefinition is visible in the body of the code after inlineLetDef is called, so I think the code has been inserted, but I don't know what the Value.Variables are doing there now.
Having taken some time to take the error message apart and see what's in the IR, I noticed that it was multiplying priceValue by getPriceValue, and the value of the bonanzaDiscount FloatLiteral 0.15
was completely absent.
That led me to suspect inlineLetDef is inlining values into the wrong place. I created
testLetDef : List AntiqueSubset -> List AntiqueSubset
testLetDef source =
let
max = 20.0
min = 10.0
in
source
|> List.filter
(\antique ->
(antique.ageOfItem <= max) && (antique.ageOfItem >= min)
)
and this gets transpiled into
def testLetDef(
source: org.apache.spark.sql.DataFrame
): org.apache.spark.sql.DataFrame =
source.filter(((org.apache.spark.sql.functions.col("ageOfItem")) <= (10)) and ((org.apache.spark.sql.functions.col("ageOfItem")) >= (20)))
i.e. it goes from the condition of ageOfItem being <= 20.0 and >= 10.0, to <= 10 and >= 20.